Skip to content

Instantly share code, notes, and snippets.

@hasufell
Last active March 7, 2016 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hasufell/893361b707e355888aaf to your computer and use it in GitHub Desktop.
Save hasufell/893361b707e355888aaf to your computer and use it in GitHub Desktop.

Paths and Files

We think of a Path type and a File type as two different concepts. Both need to be expressed in a way that allows us to reason about them.

What information to encode on path level?

Is the path absolute?

Important for thread safety. No real drawbacks. Should always be encoded.

Can also fix the highly dangerous misbehavior of System.FilePath by forcing relative or absolute paths for some combinators:

# ghci
*System.FilePath> "/home" </> "/"
"/"

I hope you don't run that stuff in production code...

Does the path end in a trailing slash?

Difficult problem. The path library does encode this information on type-level even:

data Path b t

-- where `b` can be
data Abs
-- or
data Rel

-- and `t` can be
data File
-- or
data Dir

This has a few advantages:

  • guarantee about trailing path separators at compile-time, since Dir means the Path always ends in a trailing separator
  • a lot of functions like (</>) (combines paths) can be simplified and are more robust
  • supposed type safety about File vs Directory

But there are a lot of drawbacks too:

  • the File vs Directory disinction is basically a lie... it can be completely unrelated to what is going on at IO level. In addition, a directory is also a file (just not a regular file), so all this really tells us is whether the path ends in a trailing path separator or not.
  • it makes the API really complicated for some use cases where we basically don't care (yet) whether a file is a directory. And even if we do, the path type level information is not really enough. We'll need IO actions anyway to reliably make a decision. Although, for the path library we can leave the Dir vs File distinction out and give it a general t placeholder, but some functions still force us to make a decision, like filename or dirname.
  • in some cases we may end up having Path b File, although we know it's in fact a directory, but we need the path without trailing separator and in exactly that type. So the types are suddenly broken for us.

Part of the File vs Dir problem could be fixed just by renaming the types, so it's clear we are really just talking about the trailing path separator and not about actual file type concepts, which the IO-free path code doesn't know about anyway.

But even then, the API will remain complicated for some use cases, so the question is: when do we have to know about trailing path separators? And do we at compile-time? Since we are not in a shell, we are only talking about low-level POSIX functions, basically.

A few suggestions on how to fix this maybe:

  1. just remove the information about trailing path separators from the type and let us deal with it when we actually care about it (no guarantee whether any path ends in a trailing path separator)
  2. just remove the information about trailing path separators from the type and also strip them when constructing a path, so all paths have no trailing separator except '/' (probably not a good idea if we want to ship this as a library)
  3. encode the information about trailing path separators on constructor-level, so we can access it when we actually care about (will cause us to change a few types to Maybe foo)
  4. rewrite some of the library functions (basename :: Path b t -> Path Rel t instead of filename :: Path b File -> Path Rel File and dirname :: Path b Dir -> Path Rel Dir) so they don't force us to make a decision that often/early

What information to encode on file level?

  • name
  • path (via proper Path type)
  • type (via constructor?)
  • fileinfo like modification time, device id, ...

Proposal for File type

-- |Represents a file. The `anchor` field is the base path
-- to that file without the filename.
data AnchoredFile a =
  (:/) { anchor :: String, file :: File a }
  deriving (Eq, Show)

-- |The String in the name field is always a file name, never a full path.
-- The free type variable is used in the File/Dir constructor and can hold
-- Handles, Strings representing a file's contents or anything else you can
-- think of. We catch any IO errors in the Failed constructor. an Exception
-- can be converted to a String with 'show'.
data File a =
    Failed {
    name :: String
  , err  :: IOException
  }
  | Dir {
    name :: FileName
  , fvar :: a
  }
  | RegFile {
    name :: String
  , fvar :: a
  }
  | SymLink {
    name  :: String
  , fvar  :: a
  , sdest :: AnchoredFile a  -- ^ symlink madness,
                             --   we need to know where it points to
  }
  | BlockDev {
    name :: String
  , fvar :: a
  }
  | CharDev {
    name :: String
  , fvar :: a
  }
  | NamedPipe {
    name :: String
  , fvar :: a
  }
  | Socket {
    name :: String
  , fvar :: a
  } deriving (Show, Eq)

-- |This can be thrown into the free `a` variable of `File`.
data FileInfo = FileInfo {
    deviceID :: DeviceID
  , fileID :: FileID
  , fileMode :: FileMode
  , linkCount :: LinkCount
  , fileOwner :: UserID
  , fileGroup :: GroupID
  , specialDeviceID :: DeviceID
  , fileSize :: FileOffset
  , accessTime :: EpochTime
  , modificationTime :: EpochTime
  , statusChangeTime :: EpochTime
} deriving (Show, Eq, Ord)

However, this does not use a strict path type. Since we construct the File via special functions, do we even need it at that point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment