Skip to content

Instantly share code, notes, and snippets.

@ncw
Created July 21, 2020 20:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ncw/850772f1052b26c7e92e2a4a248dbb47 to your computer and use it in GitHub Desktop.
Save ncw/850772f1052b26c7e92e2a4a248dbb47 to your computer and use it in GitHub Desktop.
Response to File System Interfaces for Go proposal

This is going to be a long reply based in my experiences with rclone: https://rclone.org. Rclone interfaces with 30ish different cloud providers and has some very similar interfaces so I feel like I might have something to contribute!

As part of rclone I have made very similar interfaces. In fact there are two, one lower level more matched to the typical things cloud storage systems can do (eg read as streams, use Range requests, not seek in writes) and one which is more general purpose which can do everything a normal file system can (eg seek while writing, ReadAt, WriteAt, etc).

The low level interface is defined here and this is described in more detail a bit later.

The high level interface is defined here - this wraps a low level interface to make a high level interface. This interface looks very much like the services provided by the os module and is used to pass to things like golang.org/x/net/webdav

Extension interfaces and the extension pattern

Extension (or optional as I usually call them) interfaces are a big maintenance burden - wrapping them is really hard.

If one file system wants to wrap another (let's say we are implementing a file system which can encrypt any other file system) it has to implement all the optional methods to be most useful. However the file system it is wrapping may not be known until run time and may not implement all the optional methods.

We can work around this problem if we define an error that optional interfaces must return if they aren't implemented - say os.ErrNotImplemented and document this. I think having a sentinel error for this is really important.

This has been a big problem in rclone and I ended up giving up on optional interfaces in the low level interface and going with function pointers - classic vtable style - so that I could see without calling it whether it is supported or not. Sometimes you need to so a lot of setup to call a method and finding out it isn't supported only when you call it is too late.

(This could also be fixed by a call which could remove a method from an instance of an interface's method set.)

I'm not suggesting function pointers are the right choice here, I just wanted to make sure everyone understands that optional interfaces cause a burden.

io/fs

As others have mentioned using the name "fs" would cause trouble.

Here are some examples from my GOPATH

code.google.com/p/rsc/appfs/fs
bazil.org/fuse/fs
github.com/hanwen/go-fuse/fs
github.com/hanwen/go-mtpfs/fs
github.com/dotcloud/docker/pkg/symlink/testdata/fs
github.com/dotcloud/docker/pkg/libcontainer/cgroups/fs
github.com/tools/godep/vendor/github.com/kr/fs
github.com/kr/fs
github.com/rclone/rclone/fs
github.com/prometheus/procfs/internal/fs
github.com/minio/minio/docs/zh_CN/backend/fs
github.com/golang/dep/internal/fs
github.com/restic/restic/internal/fs
github.com/opencontainers/runc/libcontainer/cgroups/fs
github.com/Xuanwo/storage/services/fs

Rclone's main interface is called fs.Fs so this would be really painful for me in particular!

File name syntax

The use of unrooted names—x/y/z.jpg instead of /x/y/z.jpg—is meant to make clear that the name is only meaningful when interpreted relative to a particular file system root, which is not specified in the name.

This is the scheme rclone uses, however note that some file systems really need that leading / to be available. For example if you are using SFTP with the / you are looking at the root of the disk, without it you are looking at the user's home directory.

ReadFile

I don't think this is worth making an interface for. Reading a file isn't a common operation and the files being read are by definition small. Since the FS has to implement Open/Read/Close anyway I don't think it will save code.

Walk

Rclone spends a lot of effort exposing the providers abilities to do recursive directory listings very fast.

For example to recursively list an S3 bucket with n files in you can do in n/1000 API calls, however to recursively list each individual directory takes as many API calls as there are directories and it is much much slower in general.

Exposing a recursive directory listing would be a useful optimization for cloud storage backends.

Glob

Glob is an interesting idea. It would be hard to implement for cloud providers though as I can't think of any which support the exact semantics of glob. Some of them do allow a filtered directory listing (eg Google Drive) but only using Google Drive's very specific searching semantics.

On balance my feeling is that Glob isn't useful enough.

Comparison with Rclone's interfaces

Here is a list of Rclone's low level interface methods. I've included these as I think these might be useful to guide the thinking process as to what you can do with cloud and network storage. (I've removed some rclone specific things from these).

Fs: Mandatory methods

// Precision of the ModTimes in this Fs - this is important for syncing
Precision() time.Duration

// List the objects and directories in dir into entries.  The
// entries can be returned in any order but should be for a
// complete directory.
List(ctx context.Context, dir string) (entries DirEntries, err error)

// NewObject finds the Object at remote
NewObject(ctx context.Context, remote string) (Object, error)

// Put in to the remote path with the modTime and size given in src
Put(ctx context.Context, in io.Reader, src ObjectInfo, options ...OpenOption) (Object, error)

// Mkdir makes the directory (container, bucket)
Mkdir(ctx context.Context, dir string) error

// Rmdir removes the directory (container, bucket) if empty
Rmdir(ctx context.Context, dir string) error

Fs: Optional methods

// Purge all files in dir and the dir itself (think "rm -rf dir")
Purge func(ctx context.Context, dir string) error

// Copy src to this remote using server side copy operations.
Copy func(ctx context.Context, src Object, remote string) (Object, error)

// Move src to this remote using server side move operations. (Rename essentially)
Move func(ctx context.Context, src Object, remote string) (Object, error)

// DirMove moves src, srcRemote to this remote at dstRemote
// using server side move operations. (Rename a directory)
DirMove func(ctx context.Context, src Fs, srcRemote, dstRemote string) error

// PutStream uploads to the remote path with the modTime given of indeterminate size
//
// Some backends can't upload files unless they know how big they are in advance
// This is potentially a problem
PutStream func(ctx context.Context, in io.Reader, src ObjectInfo, options ...OpenOption) (Object, error)

// ListR lists the objects and directories of the Fs starting
// from dir recursively into out.
ListR func(ctx context.Context, dir string, callback func(entries DirEntries) error) error

// OpenWriterAt opens with a handle for random access writes
OpenWriterAt func(ctx context.Context, remote string, size int64) (WriterAtCloser, error)

Object: Mandatory methods

// String returns a description of the Object
String() string

// Remote returns the remote path
Remote() string

// ModTime returns the modification date of the file
// It should return a best guess if one isn't available
ModTime(context.Context) time.Time

// Size returns the size of the file
Size() int64

// SetModTime sets the metadata on the object to set the modification date
SetModTime(ctx context.Context, t time.Time) error

// Open opens the file for read.  Call Close() on the returned io.ReadCloser
Open(ctx context.Context, options ...OpenOption) (io.ReadCloser, error)

// Update the object with the contents of in using the modTime and size in src
Update(ctx context.Context, in io.Reader, src ObjectInfo, options ...OpenOption) error

// Removes this object
Remove(ctx context.Context) error

Object: Optional methods

// MimeType returns the content type of the Object if
// known, or "" if not
MimeType(ctx context.Context) string

// ID returns the ID of the Object if known, or "" if not
ID() string

Summary

I think this is an excellent effort!

I'm nervous about the optional interfaces though - I can see a proliferation of them to do all the things left out of the proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment