Skip to content

Instantly share code, notes, and snippets.

@milseman
Last active February 5, 2021 16:32
Show Gist options
  • Save milseman/294bd494d6911c65b80fccff5873b295 to your computer and use it in GitHub Desktop.
Save milseman/294bd494d6911c65b80fccff5873b295 to your computer and use it in GitHub Desktop.
[Proposal v2] FilePath Syntactic Operations

FilePath Syntactic Operations

Introduction

FilePath appeared in System 0.0.1 with a minimal API. This proposal adds API for syntactic operations, which are performed on the structure of the path and thus do not consult with the file system or make any system calls. These include inspecting the structure of paths, modifying paths, and accessing individual components.

Additionally, this proposal greatly expands Windows support and enables writing platform-agnostic path manipulation code.

Future Work: Operations that consult the file system, e.g. resolving symlinks.

Design

Windows support

Furthering Swift's push for Windows and System's initial Windows support, this proposal updates FilePath to work well on Windows in addition to Unix platforms.

The proposed API is designed to work naturally and intuitively with both Unix-style and Windows paths. Most of the concepts and even terminology are shared across platforms, though there are some minor differences (e.g. this proposal uses the word "absolute" to refer to what is formally called "fully-qualified" on Windows).

Introducing FilePath.Root and FilePath.Component

FilePath.Root represents the root of a path. On Unix, this is simply /, but on Windows it can include volume and server/share information.

extension FilePath {
  /// Represents a root of a file path.
  ///
  /// On Unix, a root is simply the directory separator `/`.
  ///
  /// On Windows, a root contains the entire path prefix up to and including
  /// the final separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/`
  /// * Windows:
  ///   * `C:\`
  ///   * `C:`
  ///   * `\`
  ///   * `\\server\share\`
  ///   * `\\?\UNC\server\share\`
  ///   * `\\?\Volume{12345678-abcd-1111-2222-123445789abc}\`
  public struct Root { }
}

Future Work: Windows root analysis APIs communicating syntactic form as well volume or server/share information.

FilePath.Component represents a single non-root component of a path. A component can be a file or directory name or one of the special directories . or ... Components are always non-empty and do not contain a directory separator.

extension FilePath {
  /// Represents an individual, non-root component of a file path.
  ///
  /// Components can be one of the special directory components (`.` or `..`)
  /// or a file or directory name. Components are never empty and never
  /// contain the directory separator.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     let file: FilePath.Component = "foo.txt"
  ///     file.kind == .regular           // true
  ///     file.extension                  // "txt"
  ///     path.append(file)               // path is "/tmp/foo.txt"
  ///
  public struct Component {
    /// Whether a component is a regular file or directory name, or a special
    /// directory `.` or `..`
    public enum Kind {
      /// The special directory `.`, representing the current directory.
      case currentDirectory

      /// The special directory `..`, representing the parent directory.
      case parentDirectory

      /// A file or directory name
      case regular
    }

    /// The kind of this component
    public var kind: Kind { get }
  }
}

These can conveniently be created from a string literal and can be printed, just like FilePath.

extension FilePath.Component: CustomStringConvertible, CustomDebugStringConvertible, ExpressibleByStringLiteral {
  /// A textual representation of the path component.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  public var description: String { get }

  /// A textual representation of the path component, suitable for debugging.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  public var debugDescription: String { get }

  /// Create a file path component from a string literal.
  ///
  /// Precondition: `stringLiteral` is non-empty and has only one component in it.
  public init(stringLiteral: String)

  /// Create a file path component from a string.
  ///
  /// Returns `nil` if `string` is empty, a root, or has more than one component
  /// in it.
  public init?(_ string: String)
}

extension FilePath.Root: CustomStringConvertible, CustomDebugStringConvertible, ExpressibleByStringLiteral {
  /// A textual representation of the path root.
  ///
  /// If the content of the path root isn't a well-formed Unicode string,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  @inline(never)
  public var description: String { get }

  /// A textual representation of the path root, suitable for debugging.
  ///
  /// If the content of the path root isn't a well-formed Unicode string,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  public var debugDescription: String { get }

  /// Create a file path root from a string literal.
  ///
  /// Precondition: `stringLiteral` is non-empty and is a root.
  public init(stringLiteral: String)

  /// Create a file path root from a string.
  ///
  /// Returns `nil` if `string` is empty or is not a root.
  public init?(_ string: String)
}

FilePath.ComponentView

FilePath.ComponentView is a BidirectionalCollection and RangeReplaceableCollection of the non-root components that comprise a path. The Index type is an opaque wrapper around FilePath's underlying storage index.

extension FilePath {
  /// A bidirectional, range replaceable collection of the non-root components
  /// that make up a file path.
  ///
  /// ComponentView provides access to standard `BidirectionalCollection`
  /// algorithms for accessing components from the front or back, as well as
  /// standard `RangeReplaceableCollection` algorithms for modifying the
  /// file path using component or range of components granularity.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/./home/./username/scripts/./tree"
  ///     let scriptIdx = path.components.lastIndex(of: "scripts")!
  ///     path.components.insert("bin", at: scriptIdx)
  ///     // path is "/./home/./username/bin/scripts/./tree"
  ///
  ///     path.components.removeAll { $0.kind == .currentDirectory }
  ///     // path is "/home/username/bin/scripts/tree"
  ///
  public struct ComponentView: BidirectionalCollection, RangeReplaceableCollection { }

  /// View the non-root components that make up this path.
  public var components: ComponentView { get set }
}

FilePath can be created by a given root and components. FilePath.init(root:_:ComponentView.SubSequence) is a more efficient overload that can directly access the underlying storage, which already has normalized separators between components.

extension FilePath {
  /// Create a file path from a root and a collection of components.
  public init<C: Collection>(root: Root?, _ components: C)
    where C.Element == Component

  /// Create a file path from a root and any number of components.
  public init(root: Root?, components: Component...)

  /// Create a file path from an optional root and a slice of another path's
  /// components.
  public init(root: Root?, _ components: ComponentView.SubSequence)
}

Basic queries

  /// Returns whether `other` is a prefix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.starts(with: "/")              // true
  ///     path.starts(with: "/usr/bin")       // true
  ///     path.starts(with: "/usr/bin/ls")    // true
  ///     path.starts(with: "/usr/bin/ls///") // true
  ///     path.starts(with: "/us")            // false
  ///
  public func starts(with other: FilePath) -> Bool

  /// Returns whether `other` is a suffix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.ends(with: "ls")             // true
  ///     path.ends(with: "bin/ls")         // true
  ///     path.ends(with: "usr/bin/ls")     // true
  ///     path.ends(with: "/usr/bin/ls///") // true
  ///     path.ends(with: "/ls")            // false
  ///
  public func ends(with other: FilePath) -> Bool

  /// Whether this path is empty
  public var isEmpty: Bool { get }
}

Windows roots are more complex and can take several different syntactic forms, carry additional information within them such as a drive letter or server/share information, and the presence of a root does not mean that the path is absolute (i.e. "fully-qualified" in Windows-speak).

For example, C:foo refers to foo relative to the current directory on the C drive, and \foo refers to foo at the root of the current drive. Neither of those are absolute, i.e. fully-qualified, even though they have roots.

extension FilePath {
  /// Returns true if this path uniquely identifies the location of
  /// a file without reference to an additional starting location.
  ///
  /// On Unix platforms, absolute paths begin with a `/`. `isAbsolute` is
  /// equivalent to `root != nil`.
  ///
  /// On Windows, absolute paths are fully qualified paths. `isAbsolute` is
  /// _not_ equivalent to `root != nil` for traditional DOS paths
  /// (e.g. `C:foo` and `\bar` have roots but are not absolute). UNC paths
  /// and device paths are always absolute. Traditional DOS paths are
  /// absolute only if they begin with a volume or drive followed by
  /// a `:` and a separator.
  ///
  /// NOTE: This does not perform shell expansion or substitute
  /// environment variables; paths beginning with `~` are considered relative.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin`
  ///   * `/tmp/foo.txt`
  ///   * `/`
  /// * Windows:
  ///   * `C:\Users\`
  ///   * `\\?\UNC\server\share\bar.exe`
  ///   * `\\server\share\bar.exe`
  public var isAbsolute: Bool { get }

  /// Returns true if this path is not absolute (see `isAbsolute`).
  ///
  /// Examples:
  /// * Unix:
  ///   * `~/bar`
  ///   * `tmp/foo.txt`
  /// * Windows:
  ///   * `bar\baz`
  ///   * `C:Users\`
  ///   * `\Users`
  public var isRelative: Bool { get }
}

Path decomposition and analysis

Paths can be decomposed into their (optional) root and their (potentially empty) components.

extension FilePath {
  /// Returns the root of a path if there is one, otherwise `nil`.
  ///
  /// On Unix, this will return the leading `/` if the path is absolute
  /// and `nil` if the path is relative.
  ///
  /// On Windows, for traditional DOS paths, this will return
  /// the path prefix up to and including a root directory or
  /// a supplied drive or volume. Otherwise, if the path is relative to
  /// both the current directory and current drive, returns `nil`.
  ///
  /// On Windows, for UNC or device paths, this will return the path prefix
  /// up to and including the host and share for UNC paths or the volume for
  /// device paths followed by any subsequent separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => /`
  ///   * `foo/bar  => nil`
  /// * Windows:
  ///   * `C:\foo\bar                => C:\`
  ///   * `C:foo\bar                 => C:`
  ///   * `\foo\bar                  => \`
  ///   * `foo\bar                   => nil`
  ///   * `\\server\share\file       => \\server\share\`
  ///   * `\\?\UNC\server\share\file => \\?\UNC\server\share\`
  ///   * `\\.\device\folder         => \\.\device\`
  ///
  /// Setting the root to `nil` will remove the root and setting a new
  /// root will replace the root.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/foo/bar"
  ///     path.root = nil // path is "foo/bar"
  ///     path.root = "/" // path is "/foo/bar"
  ///
  /// Example (Windows):
  ///
  ///     var path: FilePath = #"\foo\bar"#
  ///     path.root = nil         // path is #"foo\bar"#
  ///     path.root = "C:"        // path is #"C:foo\bar"#
  ///     path.root = #"C:\"#     // path is #"C:\foo\bar"#
  ///
  public var root: FilePath.Root? { get set }

  /// Creates a new path containing just the components, i.e. everything
  /// after `root`.
  ///
  /// Returns self if `root == nil`.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => foo/bar`
  ///   * `foo/bar  => foo/bar`
  ///   * `/        => ""`
  /// * Windows:
  ///   * `C:\foo\bar                  => foo\bar`
  ///   * `foo\bar                     => foo\bar`
  ///   * `\\?\UNC\server\share\file   => file`
  ///   * `\\?\device\folder\file.exe  => folder\file.exe`
  ///   * `\\server\share\file         => file`
  ///   * `\                           => ""`
  ///
  public func removingRoot() -> FilePath
}

Setters allow for in-place mutation. FilePath.root's setter allows making a path relative or absolute, and even allows switching root representations on Windows.

A common decomposition of a path is between it's last non-root component and everything prior to this (e.g. basename and dirname in C).

extension FilePath {
  /// Returns the final component of the path.
  /// Returns `nil` if the path is empty or only contains a root.
  ///
  /// Note: Even if the final component is a special directory
  /// (`.` or `..`), it will still be returned. See `lexicallyNormalize()`.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin/ => bin`
  ///   * `/tmp/foo.txt    => foo.txt`
  ///   * `/tmp/foo.txt/.. => ..`
  ///   * `/tmp/foo.txt/.  => .`
  ///   * `/               => nil`
  /// * Windows:
  ///   * `C:\Users\                    => Users`
  ///   * `C:Users\                     => Users`
  ///   * `C:\                          => nil`
  ///   * `\Users\                      => Users`
  ///   * `\\?\UNC\server\share\bar.exe => bar.exe`
  ///   * `\\server\share               => nil`
  ///   * `\\?\UNC\server\share\        => nil`
  ///
  public var lastComponent: Component? { get }

  /// Creates a new path with everything up to but not including
  /// `lastComponent`.
  ///
  /// If the path only contains a root, returns `self`.
  /// If the path has no root and only includes a single component,
  /// returns an empty FilePath.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/bin/ls => /usr/bin`
  ///   * `/foo        => /`
  ///   * `/           => /`
  ///   * `foo         => ""`
  /// * Windows:
  ///   * `C:\foo\bar.exe                 => C:\foo`
  ///   * `C:\                            => C:\`
  ///   * `\\server\share\folder\file.txt => \\server\share\folder`
  ///   * `\\server\share\                => \\server\share\`
  public func removingLastComponent() -> FilePath

To help with discoverability for new users to System, we add unavailble-renamed declarations for basename and dirname (similar to unavailble-renamed declarations for constants elsewhere in System):

extension FilePath {
  @available(*, unavailable, renamed: "removingLastComponent()")
  public var dirname: FilePath { removingLastComponent() }

  @available(*, unavailable, renamed: "lastComponent")
  public var basename: Component? { lastComponent }
}

Components may be decomposed into their stem and (optional) extension (.txt, .o, .app, etc.) FilePath gains convenience APIs for dealing with the stem and extension of the last component, if it exists. FilePath also gains a setter for extension for easy and efficient in-place mutation.

extension FilePath.Component {
  /// The extension of this file or directory component.
  ///
  /// If `self` does not contain a `.` anywhere, or only
  /// at the start, returns `nil`. Otherwise, returns everything after the dot.
  ///
  /// Examples:
  ///   * `foo.txt    => txt`
  ///   * `foo.tar.gz => gz`
  ///   * `Foo.app    => app`
  ///   * `.hidden    => nil`
  ///   * `..         => nil`
  ///
  public var `extension`: String? { get }

  /// The non-extension portion of this file or directory component.
  ///
  /// Examples:
  ///   * `foo.txt => foo`
  ///   * `foo.tar.gz => foo.tar`
  ///   * `Foo.app => Foo`
  ///   * `.hidden => .hidden`
  ///   * `..      => ..`
  ///
  public var stem: String { get }
}

extension FilePath {
  /// The extension of the file or directory last component.
  ///
  /// If `lastComponent` is `nil` or one of the special path components
  /// `.` or `..`, `get` returns `nil` and `set` does nothing.
  ///
  /// If `lastComponent` does not contain a `.` anywhere, or only
  /// at the start, `get` returns `nil` and `set` will append a
  /// `.` and `newValue` to `lastComponent`.
  ///
  /// Otherwise `get` returns everything after the last `.` and `set` will
  /// replace the extension.
  ///
  /// Examples:
  ///   * `/tmp/foo.txt                 => txt`
  ///   * `/Appliations/Foo.app/        => app`
  ///   * `/Appliations/Foo.app/bar.txt => txt`
  ///   * `/tmp/foo.tar.gz              => gz`
  ///   * `/tmp/.hidden                 => nil`
  ///   * `/tmp/.hidden.                => ""`
  ///   * `/tmp/..                      => nil`
  ///
  /// Example:
  ///
  ///     var path = "/tmp/file"
  ///     path.extension = ".txt" // path is "/tmp/file.txt"
  ///     path.extension = ".o"   // path is "/tmp/file.o"
  ///     path.extension = nil    // path is "/tmp/file"
  ///     path.extension = ""     // path is "/tmp/file."
  ///
  public var `extension`: String? { get set }

  /// The non-extension portion of the file or directory last component.
  ///
  /// Returns `nil` if `lastComponent` is `nil`
  ///
  ///   * `/tmp/foo.txt                 => foo`
  ///   * `/Appliations/Foo.app/        => Foo`
  ///   * `/Appliations/Foo.app/bar.txt => bar`
  ///   * `/tmp/.hidden                 => .hidden`
  ///   * `/tmp/..                      => ..`
  ///   * `/                            => nil`
  public var stem: String? { get }
}

Future Work: Component analysis API providing whether a component is hidden (begins with a .) as well as the ability to iterate over all of the extensions (e.g. .tar.gz). Additionally, modification helpers such as the ability to append to the stem (e.g. -backup).

FilePath.extension's setter allows for convenient in-place reassigning or adding/removing of an extension.

Rationale: stem and extension are expressed as Strings which are far more ergonomic than a slice of raw platform chars. Thus, these operations perform Unicode error correction, which is desirable most of the time when reading content (and setting extensions containing invalid Unicode is especially indicative of a programming error).

Rationale: FilePath.stem does not have a setter. Components are never empty and always have stems. Thus, setting a stem to "" or even nil will result in either an invalid component or a hidden file whose new name was an extension. Setting stems in this fashion is indicative of a programming error.

Rationale: Components presently do not have mutating operations such as setters. Components are slice types and participate in COW; they wouldn't mutate the containing path unless part of an accessor chain. Allowing mutations would give the false impression that modifications would be published back to the containing path, e.g. while iterating the components. This decision could be revisited if Swift gains support for mutable borrows.

Lexical operations

FilePath supports lexical (i.e. does not call into the file system to e.g. follow symlinks) operations such as normalization of special directory components (. and ..).

extension FilePath {
  /// Whether the path is in lexical-normal form, that is `.` and `..`
  /// components have been collapsed lexically (i.e. without following
  /// symlinks).
  ///
  /// Examples:
  /// * `"/usr/local/bin".isLexicallyNormal == true`
  /// * `"../local/bin".isLexicallyNormal   == true`
  /// * `"local/bin/..".isLexicallyNormal   == false`
  public var isLexicallyNormal: Bool { get }

  /// Collapse `.` and `..` components lexically (i.e. without following
  /// symlinks).
  ///
  /// Examples:
  /// * `/usr/./local/bin/.. => /usr/local`
  /// * `/../usr/local/bin   => /usr/local/bin`
  /// * `../usr/local/../bin => ../usr/bin`
  public mutating func lexicallyNormalize()

  /// Returns a copy of `self` in lexical-normal form, that is `.` and `..`
  /// components have been collapsed lexically (i.e. without following
  /// symlinks). See `lexicallyNormalize`
  public func lexicallyNormalized() -> FilePath
}

FilePath provides API to help protect against arbitrary path traversal from untrusted sub-paths:

extension FilePath {
  /// Create a new `FilePath` by resolving `subpath` relative to `self`,
  /// ensuring that the result is lexically contained within `self`.
  ///
  /// `subpath` will be lexically normalized (see `lexicallyNormalize`) as
  /// part of resolution, meaning any contained `.` and `..` components will
  /// be collapsed without resolving symlinks. Any root in `subpath` will be
  /// ignored.
  ///
  /// Returns `nil` if the result would "escape" from `self` through use of
  /// the special directory component `..`.
  ///
  /// This is useful for protecting against arbitrary path traversal from an
  /// untrusted subpath: the result is guaranteed to be lexically contained
  /// within `self`. Since this operation does not consult the file system to
  /// resolve symlinks, any escaping symlinks nested inside of `self` can still
  /// be targeted by the result.
  ///
  /// Example:
  ///
  ///     let staticContent: FilePath = "/var/www/my-website/static"
  ///     let links: [FilePath] =
  ///       ["index.html", "/assets/main.css", "../../../../etc/passwd"]
  ///     links.map { staticContent.lexicallyResolving($0) }
  ///       // ["/var/www/my-website/static/index.html",
  ///       //  "/var/www/my-website/static/assets/main.css",
  ///       //  nil]
  public func lexicallyResolving(_ subpath: FilePath) -> FilePath?
}

Modifying paths

FilePath supports common mutation operations.

extension FilePath {
  /// If `prefix` is a prefix of `self`, removes it and returns `true`.
  /// Otherwise returns `false`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/local/bin"
  ///     path.removePrefix("/usr/bin")   // false
  ///     path.removePrefix("/us")        // false
  ///     path.removePrefix("/usr/local") // true, path is "bin"
  ///
  public mutating func removePrefix(_ prefix: FilePath) -> Bool

  /// Append a `component` on to the end of this path.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     let sub: FilePath = "foo/./bar/../baz/."
  ///     for comp in sub.components.filter({ $0.kind != .currentDirectory }) {
  ///       path.append(comp)
  ///     }
  ///     // path is "/tmp/foo/bar/../baz"
  ///
  public mutating func append(_ component: FilePath.Component)

  /// Append `components` on to the end of this path.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/"
  ///     path.append(["usr", "local"])     // path is "/usr/local"
  ///     let otherPath: FilePath = "/bin/ls"
  ///     path.append(otherPath.components) // path is "/usr/local/bin/ls"
  ///
  public mutating func append<C: Collection>(_ components: C)
    where C.Element == FilePath.Component

  /// Append the contents of `other`, ignoring any spurious leading separators.
  ///
  /// A leading separator is spurious if `self` is non-empty.
  ///
  /// Example:
  ///   var path: FilePath = ""
  ///   path.append("/var/www/website") // "/var/www/website"
  ///   path.append("static/assets") // "/var/www/website/static/assets"
  ///   path.append("/main.css") // "/var/www/website/static/assets/main.css"
  ///
  public mutating func append(_ other: String)

  /// Non-mutating version of `append(_:Component)`.
  public func appending(_ other: Component) -> FilePath

  /// Non-mutating version of `append(_:C)`.
  public func appending<C: Collection>(
    _ components: C
  ) -> FilePath where C.Element == FilePath.Component

  /// Non-mutating version of `append(_:String)`.
  public func appending(_ other: String) -> FilePath

  /// If `other` does not have a root, append each component of `other`. If
  /// `other` has a root, replaces `self` with other.
  ///
  /// This operation mimics traversing a directory structure (similar to the
  /// `cd` command), where pushing a relative path will append its components
  /// and pushing an absolute path will first clear `self`'s existing
  /// components.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     path.push("dir/file.txt") // path is "/tmp/dir/file.txt"
  ///     path.push("/bin")         // path is "/bin"
  ///
  public mutating func push(_ other: FilePath)

  /// Non-mutating version of `push()`
  public func pushing(_ other: FilePath) -> FilePath

  /// In-place mutating variant of `removingLastComponent`.
  ///
  /// If `self` only contains a root, does nothing and returns `false`.
  /// Otherwise removes `lastComponent` and returns `true`.
  ///
  /// Example:
  ///
  ///     var path = "/usr/bin"
  ///     path.removeLastComponent() == true  // path is "/usr"
  ///     path.removeLastComponent() == true  // path is "/"
  ///     path.removeLastComponent() == false // path is "/"
  ///
  @discardableResult
  public mutating func removeLastComponent() -> Bool

  /// Remove the contents of the path, keeping the null terminator.
  public mutating func removeAll(keepingCapacity: Bool = false)

  /// Reserve enough storage space to store `minimumCapacity` platform
  /// characters.
  public mutating func reserveCapacity(_ minimumCapacity: Int)
}

Rationale: removeLastComponent does not return the component, as components are slices of FilePath's underlying storage. Returning a removed component would trigger a COW copy.

Rationale: We do not propose append taking a FilePath since appending absolute paths is problematic. Silently ignoring a root (loose stringy semantics) is commonly expected when given a string literal, so we provide an overload of append taking a String, which is far more convenient than splitting components out by hand. Silently ignoring a root is surprising and undesirable in programmatic/strongly-typed use cases, so we provide push which has similar semantics to operations from other languages (Rust's push, C#'s Combine, Python's join, and C++17's append). This allows programatic use cases to explicitly choose semantics by calling either other.push(myPath) or other.append(myPath.components), depending on the desired behavior.

Paths and strings

Just like FilePath, FilePath.Component and FilePath.Root can be decoded/validated into a Swift String.

extension String {
  /// Creates a string by interpreting the path component's content as UTF-8 on
  /// Unix and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as
  ///   `CInterop.PlatformUnicodeEncoding`.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this initializer replaces invalid bytes with U+FFFD.
  /// This means that, depending on the semantics of the specific file system,
  /// conversion to a string and back to a path component
  /// might result in a value that's different from the original path component.
  public init(decoding component: FilePath.Component)

  /// Creates a string from a path component, validating its contents as UTF-8
  /// on Unix and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as
  ///   `CInterop.PlatformUnicodeEncoding`.
  ///
  /// If the contents of the path component isn't a well-formed Unicode string,
  /// this initializer returns `nil`.
  public init?(validating component: FilePath.Component)

  /// On Unix, creates the string `"/"`
  ///
  /// On Windows, creates a string by interpreting the path root's content as
  /// UTF-16.
  ///
  /// - Parameter path: The path root to be interpreted as
  ///   `CInterop.PlatformUnicodeEncoding`.
  ///
  /// If the content of the path root isn't a well-formed Unicode string,
  /// this initializer replaces invalid bytes with U+FFFD.
  /// This means that on Windows,
  /// conversion to a string and back to a path root
  /// might result in a value that's different from the original path root.
  public init(decoding root: FilePath.Root) {
    self.init(_decoding: root)
  }

  /// On Unix, creates the string `"/"`
  ///
  /// On Windows, creates a string from a path root, validating its contents as
  /// UTF-16 on Windows.
  ///
  /// - Parameter path: The path root to be interpreted as
  ///   `CInterop.PlatformUnicodeEncoding`.
  ///
  /// On Windows, if the contents of the path root isn't a well-formed Unicode
  /// string, this initializer returns `nil`.
  public init?(validating root: FilePath.Root) {
    self.init(_validating: root)
  }
}

FilePath, FilePath.Component, and FilePath.Root gain convenience properties for viewing their content as Strings.

extension FilePath {
  /// Creates a string by interpreting the path’s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: path)`
  public var string: String
}

extension FilePath.Component {
  /// Creates a string by interpreting the component’s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: component)`.
  public var string: String
}

extension FilePath.Root {
  /// On Unix, this returns `"/"`.
  ///
  /// On Windows, interprets the root's content as UTF-16.
  ///
  /// This property is equivalent to calling `String(decoding: root)`.
  public var string: String
}

Rationale: While System strongly encourages the use of strong types for handling paths and path operations, systems programming has a long history of using weakly typed strings as paths. These properties enable more rapid prototyping and easier testing while being far more discoverable and ergonomic than the corresponding String initializers. This API (anti)pattern is to be used sparingly.

Separators are always normalized

FilePath now normalizes directory separators on construction and maintains this invariant across mutations. In the relative portion of the path, FilePath will strip trailing separators and coalesce repeated separators.

  FilePath("/a/b/") == "/a/b"
  FilePath("a///b") == "a/b"

Rationale: Normalization provides a simpler and safer internal representation. For example, a trailing slash can give the false impression that the last component is a directory, leading to correctness and security hazards.

Windows accepts either forwards slashes (/) or backslashes (\) as directory separators, though the platform's preferred separator is backslash. On Windows, FilePath normalizes forwards slashes to backslashes on construction. Backslashes after a UNC server/share or DOS device path's volume are treated as part of the root.

  FilePath("C:/foo/bar/") == #"C:\foo\bar"#
  FilePath(#"\\server\share\folder\"#) == #"\\server\share\folder"#
  FilePath(#"\\server\share\"#) == #"\\server\share\"#
  FilePath(#"\\?\volume\"#) == #"\\?\volume\"#

Future Work: Ability to print out paths in a specified format, perhaps encoded ala RFC-1808.

Wide and narrow characters

Unix paths are represented as contiguous CChars in memory and convert to a String by validating as UTF-8. Windows paths are represented as contiguous UInt16s in memory and are converted to a String by validating as UTF-16. Either platform may have invalid Unicode content, which only affects the conversion to Swift's Unicode-correct String type (i.e. it does not affect the semantics of other FilePath operations).

To avoid polluting the global namespace with more typealiases now and in the future, introduce CInterop to hold typealiases to (often weakly-typed) types imported from C. CModeT was defined as a global typealias in System 0.0.1 and is now nested inside CInterop, alongside CChar.

/// A namespace for C and platform types
public enum CInterop {
  #if os(macOS) || os(iOS) || os(watchOS) || os(tvOS)
  /// The C `mode_t` type.
  public typealias Mode = UInt16
  #elseif os(Windows)
  /// The C `mode_t` type.
  public typealias Mode = Int32
  #else
  /// The C `mode_t` type.
  public typealias Mode = UInt32
  #endif

  /// The C `char` type
  public typealias Char = CChar
}

To aid readability and make it easier to write code agnostic to the platform's character-width, introduce typealiases for the platform's preferred character and Unicode encoding.

extension CInterop {
  #if os(Windows)
  /// The platform's preferred character type. On Unix, this is an 8-bit C
  /// `char` (which may be signed or unsigned, depending on platform). On
  /// Windows, this is `UInt16` (a "wide" character).
  public typealias PlatformChar = UInt16
  #else
  /// The platform's preferred character type. On Unix, this is an 8-bit C
  /// `char` (which may be signed or unsigned, depending on platform). On
  /// Windows, this is `UInt16` (a "wide" character).
  public typealias PlatformChar = CInterop.Char
  #endif

  #if os(Windows)
  /// The platform's preferred Unicode encoding. On Unix this is UTF-8 and on
  /// Windows it is UTF-16. Native strings may contain invalid Unicode,
  /// which will be handled by either error-correction or failing, depending
  /// on API.
  public typealias PlatformUnicodeEncoding = UTF16
  #else
  /// The platform's preferred Unicode encoding. On Unix this is UTF-8 and on
  /// Windows it is UTF-16. Native strings may contain invalid Unicode,
  /// which will be handled by either error-correction or failing, depending
  /// on API.
  public typealias PlatformUnicodeEncoding = UTF8
  #endif
}

String, FilePath, FilePath.Component, and FilePath.Root gain "escape hatch" APIs for C interoperability using these typealiases.

extension String {
  /// Creates a string by interpreting the null-terminated platform string as
  /// UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// - Parameter platformString: The null-terminated platform string to be
  ///  interpreted as `CInterop.PlatformUnicodeEncoding`.
  ///
  /// If the content of the platform string isn't well-formed Unicode,
  /// this initializer replaces invalid bytes with U+FFFD.
  /// This means that, depending on the semantics of the specific platform,
  /// conversion to a string and back might result in a value that's different
  /// from the original platform string.
  public init(platformString: UnsafePointer<CInterop.PlatformChar>)

  /// Creates a string by interpreting the null-terminated platform string as
  /// UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// - Parameter platformString: The null-terminated platform string to be
  ///  interpreted as `CInterop.PlatformUnicodeEncoding`.
  ///
  /// If the contents of the platform string isn't well-formed Unicode,
  /// this initializer returns `nil`.
  public init?(
    validatingPlatformString platformString: UnsafePointer<CInterop.PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the string,
  /// represented as a null-terminated platform string.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<CInterop.PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath {
  /// Creates a file path by copying bytes from a null-terminated platform
  /// string.
  ///
  /// - Parameter platformString: A pointer to a null-terminated platform
  ///   string.
  public init(platformString: UnsafePointer<CInterop.PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the file path,
  /// represented as a null-terminated platform string.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<CInterop.PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath.Component {
  /// Creates a file path component by copying bytes from a null-terminated
  /// platform string.
  ///
  /// Returns `nil` if `platformString` is empty, is a root, or has more than
  /// one component in it.
  ///
  /// - Parameter string: A pointer to a null-terminated platform string.
  ///
  public init?(platformString: UnsafePointer<CInterop.PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the file path
  /// component, represented as a null-terminated platform string.
  ///
  /// If this is not the last component of a path, an allocation will occur in
  /// order to add the null terminator.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<CInterop.PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath.Root {
  /// Creates a file path root by copying bytes from a null-terminated platform
  /// string.
  ///
  /// Returns `nil` if `platformString` is empty or is not a root.
  ///
  /// - Parameter string: A pointer to a null-terminated platform string.
  ///
  public init?(platformString: UnsafePointer<CInterop.PlatformChar>) {
    self.init(_platformString: platformString)
  }

  /// Calls the given closure with a pointer to the contents of the file path
  /// root, represented as a null-terminated platform string.
  ///
  /// If the path has a relative portion, an allocation will occur in order to
  /// add the null terminator.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<CInterop.PlatformChar>) throws -> Result
  ) rethrows -> Result {
    try _withPlatformString(body)
  }
}

Future Work: Make the currently-internal SystemString public. SystemString handles the stringy implementation of FilePath, and may be useful to expose on its own.

Future Work: Windows-only APIs for widening narrow ASCII-only native strings, and narrowing ASCII content, for compatibility reasons.

Rejected or deferred alternatives

Deferred: Introduce RelativePath and AbsolutePath

FilePath is the most faithful surfacing of the systems programming concept of a path in Swift, and is the right type to provide to end-point developers as well. Libraries and tools built on top of System raise some notion of "canonical" paths to type-level salience, and this often falls out as absolute vs relative paths.

While System is strongly in favor of strong types and enabling libraries and tools built on top of System to use stronger types, absolute vs relative is not the only potential top-level distinction:

  • Lexically-normalized absolute is cheap to compute and check (E.g. SwiftPM's AbsolutePath type).
  • Semantically-normal (i.e. expanding symlinks and environment variables) may be more important.
  • Equivalency-normal, which includes semantically-normal plus things like Unicode normalization, case-folding, etc., (such that path equality is binary equality) could be desired for security and performance reasons.

Additionally, each specific tool and library may have a slightly different notion of "absolute". For example, some tools might consider pre-shell-expansion of ~ to be a valid start to an "absolute" path for their purposes.

We're deferring adding RelativePath, AbsolutePath, and any normalized or canonical variants until this design space is better understood. There's a chance that a future System will add a common protocol for such types. For now, libraries and tools can define strongly-typed wrappers which check their preconditions on initialization.

Dropped: basename and dirname names and setters

Prior proposal versions had basename and dirname, following Unix semantics. These names were controversial (though other names were also problematic). Their proper use was application-specific as they were not lexically normalizing (e.g. /bin/ls/..'s dirname would be /bin/ls instead of the lexical parent /).

Since Components are non-roots, lastComponent and removingLastComponent can serve as suitable names for basename and dirname. We add unavailable-renamed declarations for new users to System looking for the classic C functions, which can show up in code completion and editor fixits.

Since ComponentView is now a view of the non-root components, it can conform to RangeReplaceableCollection to provide all the standard Swift mutation algorithms. These supersede the need for the dirname and basename setters. basename's setter in particular could be confusing, as it could be an append, a remove, or a replace, depending on whether the path was just a root or whether the new value was nil.

Rejected: "Root" only refers to a separator, does not include Windows volumes

As proposed, the term "root" on Windows contains everything up to and including the directory separator. This provides a decomposition of a path into a (root?, relative), where relative can be empty.

C#'s GetPathRoot() will return the Windows volume information (e.g. drive letter and colon) in addition to the separator. Windows seems to consider this information part of the "root".

C++17 uses the term "root" similarly, and provides a decomposition into "root name" and "root directory". C++17 does not give any further help parsing the "root name", so this decomposition is fairly trivial. Since FilePath normalizes directory separators, there is little value in providing this decomposition; instead we'd like to provide richer APIs.

Future Windows-only API includes the ability to inspect the syntactic form of the root and extract volume information. This is prototyped here as two enums: one for the syntactic form (e.g. Traditional DOS vs DOS device syntax) and one to get the volume information (e.g. drive letter or UNC server/share).

Rust uses the term "root" to refer to the directory and "prefix" (term original to Rust, AFAICT) to refer to everything prior to it. Rust provides a decomposition of Windows prefixes into their syntactic form containing volume information. Our prototype is heavily inspired by Rust, but we separate syntactic form from the volume information, which we feel could be cleaner and easier to use.

An alternative could be a path decomposition into (windowsPrefix?, root?, relative) on Windows and (root?, relative) on Unix. However, we feel this is more likely to result in code maladapted for Windows than the approach proposed.

Deferred: Ability to work with paths from another platform

A cross-platform application which targets a specific platform (e.g. a script to ssh into a Linux server) might want to be able to construct and interact with paths from both the host and the target platforms.

Windows and Unix paths fundamentally have different semantics. For example, the path //server/share/file would have a root of \\server\share\ on Windows and a single file component. Calling removeLastComponent() any number of times would keep the root \\server\share\. But on Unix it would have a root of / and three components (server, share, and file), each of which could be removed via removeLastComponent(). Similarly, /tmp is (drive-)relative on Windows and absolute on Unix.

Due to the differences in semantics, this cannot in general be handled just by printing the path differently. This also can't be done contextually with thread-local storage due to incompatibility with Swift's upcoming concurrency model. If Swift adds support for task-local storage in the future, we can explore a contextual approach, though it would still require reasoning about escapes.

It's possible that in the future we introduce explicit UnixPath and WindowsPath types, conforming to a common protocol, and add a bit in the internals of FilePath to support this behavior.

Source and ABI stability impact

API changes are strictly additive.

Separator normalization does not affect the semantics of path operations. It can change how paths are printed, compared, and hashed (this proposal argues these changes are for the better).

Deprecations

A handful of APIs have been deprecated in favor of better-named alternatives.

@available(*, deprecated, renamed: "CInterop.Mode")
public typealias CModeT = CInterop.Mode

extension FilePath {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)

  @available(*, deprecated, renamed: "init(platformString:)")
  public init(cString: UnsafePointer<CChar>)

  @available(*, deprecated, renamed: "withPlatformString(_:)")
  public func withCString<Result>(
    _ body: (UnsafePointer<CChar>) throws -> Result
  ) rethrows -> Result
}
extension String {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment