Skip to content

Instantly share code, notes, and snippets.

Last active Aug 8, 2019
What would you like to do?
variations in unixfsv2 attribs

There are lots of ways we could do Attributes

Let's do all of them.

Yes, I'm serious.

Or at least a short enumeration of them. There are several viable, independently semantically useful sets of attributes we can select and specify.

All of these allude to a unixfsv2-spike#schema for the bigger context -- the Attribs structs drafted below are each a viable replacement for the Attribs there.

See especially the #using-various-schemas-as-a-feature heading for commentary on how we can use several of these at once for great victory.

Note: this is not a final proposal and is not meant to be the exclusive lists of attributes sets we might want to recognize. (E.g., there's no attribute set which recognizes mtime and nothing else, and we might well want that.) It's just a conversation starter.

This is a common take.

type Attribs struct {
	mtime  Int
	posix  Int # The standard 0777 bitpacking masks.
	sticky Bool (implicit: false)
	setuid Bool (implicit: false)
	setgid Bool (implicit: false)
	uid Int
	gid Int

The suggestion here is to continue using the ancient posix bitmasks for the "rwxrwxrwx" bits, because otherwise we have nine booleans, and that's no joy.

The other bits -- the 07000 mask -- for sticky, setuid, and setgid are broken out as individual bools, because they're very rarely set, and it's nice to be able to pull those out easily since they deserve major warnings if they're being used in a dataset (setuid and setgid, anyway).

This is the maximal take.

type Attribs struct {
	mtime  Int
	posix  Int # The standard 0777 bitpacking masks.
	sticky Bool (implicit: false)
	setuid Bool (implicit: false)
	setgid Bool (implicit: false)
	uid Int
	gid Int
	devMajor optional Int
	devMinor optional Int

These are all -- all -- the things required to directly support container images. (Ever wondered what a container acts like if "/dev/null" isn't actually a device node that, well, discards things?)

You can see a development history which finds this out the hard way in Rio's metadata package. (Note -- Linkname in that system is absent here, because we handle file types in another part of the schema, above where the attribs struct appears.)

We might use "attribs_alpha" and "attribs_gamma" both, intentionally -- use this one when we're okay accepting a filesystem that has device nodes in it; use "alpha" when we're not.

This is a very minimal take.

No mtime; absolutely no posix permission or ownership bits. Just the execute bit (optionally).

type Attribs struct {
	executable Bool (implicit: false)

With this structure, the Attribs would often be empty entirely.

This is about as minimal as you can get.

FYI, filesystems are Fun.

  • You will not set ctime unless you're writing a driver. It may be wise to give up.
  • You will not set atime in any meaningfully non-transient way even if you are writing a driver, and you'll get in arguments with proponents of noatime, relatime, and so forth along the way. It may be wise to give up.
  • The posix permissions and ownerships behave differently for symlinks on linux and mac. Weep.
  • The ability to set file times at nano precision on different platforms varies. Weep.

What this adds up to is that there is no clear superset of functionality in any of the systems in the wild; therefore, our hopes and dreams of supporting "everything" is crushed out of the gate: supporting everything for one system means supporting things that are meaningless/invalid in another, and that's true for every single system you can pick. Weep.

The name of the game in all these cases is thus figuring out -- pairwise -- which in the pair of {target platform}-vs-{our-spec} has the bigger range of values, and then define transformation functions that are as minimally lossy and minimally surprising/irritating as possible. This is not always easy, and the approach is not always clear. Keep calm and endure.

  • Is there a path where we have our systems store the most info possible by default (i.e. included in content hashes) -- but still have most of it become "zero" and then be elided by the Implicits feature?
    • the dev nodes example -- optional in one schema, absent in the other -- provides an example of how this can be cool.
  • Can we do something better about 'executable' bits?
    • I sure hope so.
    • Unfortunately, sometimes files actually do have weird stuff like "rw-r-xr--", so if we do a separate Bool for executable, we'll sometimes have to keep the 0777 as well, and define how things should behave if those then conflict. Yuck.
  • Does that "maximal information" thing actually work in the face of mtime?
    • mtime is really a spectacular buggaboo and we should do a lot of up-front work around user stories before leaping into anything.
  • Should we also have schemas that vary in whether or not e.g. devnodes are even a file type? Probably.
    • Same for symlinks, arguably.
    • Would these schemas have different Attribs as well? Different Attribs per filetype? So far we haven't explored this...
      • If not, there are some application-level sanity checks about not having devmajor and devminor ints on non-dev-type files. Which is fine, but worth brief mention.
    • How big of a matrix is acceptable here?
      • Might not be so bad. Most things line up well (e.g. don't have devmajor/minor attribs in the schema that doesn't support that dev in the filetype enum).

(This comes at a very different level of meta than the previous sections -- those were implementation details. These are some ideas of "north star" guiding principles which can try to guide decision making.)

  • Prefer simple over complex.
  • Aim for specs rather than aim for implementations.
  • Prefer convergence over coordination.

"Aim for specs" doesn't mean to say "don't build prototypes". It means someone should be able to build another implementation looking at the spec, and not looking at the details of code.

"Prefer convergence over coordination" means building systems that do the right thing by natural outcome of their design when used by people without additional coordination throughout the future as they use it. Content-addressability itself is a massive concrete bet on "convergence over coordination" -- it's our core mission. Double down on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment