Skip to content

Instantly share code, notes, and snippets.

@milseman
Last active January 21, 2021 20:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milseman/ea3e391313b5604ebb480a161e67161b to your computer and use it in GitHub Desktop.
Save milseman/ea3e391313b5604ebb480a161e67161b to your computer and use it in GitHub Desktop.
[Proposal] FilePath Syntactic Operations

FilePath Syntactic Operations

Introduction

FilePath appeared in System 0.0.1 with a minimal API. This proposal adds API for syntactic operations, which are performed on the structure of the path and thus do not consult with the file system or make any system calls. These include inspecting the structure of paths, modifying paths, and accessing individual components.

Additionally, this proposal greatly expands Windows support and enables writing platform-agnostic path manipulation code.

Future Work: Operations that consult the file system, e.g. resolving symlinks.

Pull request

Design

Windows support

Furthering Swift's push for Windows and System's initial Windows support, this proposal updates FilePath to work well on Windows in addition to Unix platforms.

The proposed API is designed to work naturally and intuitively with both Unix-style and Windows paths. Most of the concepts and even terminology are shared across platforms, though there are some minor differences (e.g. this proposal uses the word "absolute" to refer to what is formally called "fully-qualified" on Windows).

Introducing FilePath.Component

FilePath.Component represents a single component of a path. A component can be a root (which must occur at the front of the path), a special directory (. or ..), or a relative path component such as a directory or a file. Components are always non-empty and do not contain a directory separator.

extension FilePath {
  /// Represents an individual component of a file path.
  ///
  /// Components can be one of the special directory components (`.` or `..`), a root
  /// (e.g. `/`), or a file or directory name. Components are never empty and non-root
  /// components never contain the directory separator.
  public struct Component: Hashable {
    /// Whether this component is the root of its path.
    public var isRoot: Bool { get }

    /// Whether this component is the special directory `.`, representing the current directory.
    public var isCurrentDirectory: Bool { get }

    /// Whether this component is the special directory `..`, representing the parent directory.
    public var isParentDirectory: Bool { get }

    /// Whether this component is either special directory `.` or `..`.
    public var isSpecialDirectory: Bool { get }
  }
}

FilePath.Component can conveniently be created from a string literal and can be printed, just like FilePath.

extension FilePath.Component: CustomStringConvertible, CustomDebugStringConvertible, ExpressibleByStringLiteral {
  /// A textual representation of the path component.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this replaces invalid bytes them with U+FFFD. See `String.init(decoding:)`.
  public var description: String { get }

  /// A textual representation of the path component, suitable for debugging.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this replaces invalid bytes them with U+FFFD. See `String.init(decoding:)`.
  public var debugDescription: String { get }

  /// Create a file path component from a string literal.
  ///
  /// Precondition: `stringLiteral` is non-empty and has only one component in it.
  public init(stringLiteral: String)

  /// Create a file path component from a string.
  ///
  /// Precondition: `string` is non-empty and has only one component in it.
  public init(_ string: String)
}

FilePath.ComponentView

FilePath.ComponentView is a BidirectionalCollection of the components that comprise a path. The Index type is an opaque wrapper around FilePath's underlying storage index.

extension FilePath {
  /// A bidirectional collection of the components that make up a file path.
  ///
  /// If the path has a root, it will be the first component. All other components will be part
  /// of the relative path.
  public struct ComponentView: BidirectionalCollection { }

  /// View the components that make up this path.
  public var components: ComponentView
}

FilePath can be created by a component or collection of components. FilePath.init(_:ComponentView.SubSequence) is a more performant overload that can directly access the underlying storage, which already has normalized separators between components.

extension FilePath {
  /// Create a file path from a collection of components.
  public init<C>(_ components: C) where C: Collection, C.Element == Component

  /// Create a file path from a single component.
  public init(_ component: Component)

  /// Create a file path from a slice of another path's components.
  public init(_ components: ComponentView.SubSequence)
}

Future Work: A RangeReplaceable+Bidirectional relativeComponents member. FilePath.components cannot be RangeReplaceable because roots make it heterogeneous and hazardous to use generically. Similarly, an opaque contiguous RangeReplaceable+RandomAccess chars member.

Basic queries

  /// Returns whether `other` is a prefix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.starts(with: "/")              // true
  ///     path.starts(with: "/usr/bin")       // true
  ///     path.starts(with: "/usr/bin/ls")    // true
  ///     path.starts(with: "/usr/bin/ls///") // true
  ///     path.starts(with: "/us")            // false
  public func starts(with other: FilePath) -> Bool

  /// Returns whether `other` is a suffix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.ends(with: "ls")             // true
  ///     path.ends(with: "bin/ls")         // true
  ///     path.ends(with: "usr/bin/ls")     // true
  ///     path.ends(with: "/usr/bin/ls///") // true
  ///     path.ends(with: "/ls")            // false
  public func ends(with other: FilePath) -> Bool

  /// Whether this path is empty
  public var isEmpty: Bool { get }
}

Windows roots are more complex and can take several different syntactic forms, carry additional information within them such as a drive letter or server/share information, and the presence of a root does not mean that the path is absolute (i.e. "fully-qualified" in Windows-speak).

For example, C:foo refers to foo relative to the current directory on the C drive, and \foo refers to foo at the root of the current drive. Neither of those are absolute, i.e. fully-qualified, even though they have roots.

extension FilePath {
  /// Returns true if this path uniquely identifies the location of
  /// a file without reference to an additional starting location.
  ///
  /// On Unix platforms, absolute paths begin with a `/`. `isAbsolute` is equivalent
  /// to `root != nil`.
  ///
  /// On Windows, absolute paths are fully qualified paths. UNC paths and device paths
  /// are always absolute. Traditional DOS paths are absolute if they begin with a volume or drive
  /// followed by a `:` and a separator.
  ///
  /// NOTE: This does not perform shell expansion or substitute
  /// environment variables; paths beginning with `~` are considered relative.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin`
  ///   * `/tmp/foo.txt`
  ///   * `/`
  /// * Windows:
  ///   * `C:\Users\`
  ///   * `\\?\UNC\server\share\bar.exe`
  ///   * `\\server\share\bar.exe`
  public var isAbsolute: Bool { get }

  /// Returns true if this path is not absolute (see absolute).
  ///
  /// Examples:
  /// * Unix:
  ///   * `~/bar`
  ///   * `tmp/foo.txt`
  /// * Windows:
  ///   * `bar\baz`
  ///   * `C:Users\`
  ///   * `\Users`
  public var isRelative: Bool { get }
}

Path decomposition and analysis

Paths can be decomposed into their (optional) root and (potentially empty) relative portion. Or, they can be decomposed into their (optional) final relative component (basename) and the directory of that component (dirname).

extension FilePath {
  /// Returns the root directory of a path if there is one, otherwise `nil`.
  ///
  /// On Unix, this will return the leading `/` if the path is absolute
  /// and `nil` if the path is relative.
  ///
  /// On Windows, for traditional DOS paths, this will return
  /// the path prefix up to and including a root directory or
  /// a supplied drive or volume. Otherwise, if the path is relative to
  /// both the current directory and current drive, returns `nil`.
  ///
  /// On Windows, for UNC or device paths, this will return the path prefix
  /// up to and including the host and share for UNC paths or the volume for
  /// device paths followed by any subsequent separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => /`
  ///   * `foo/bar  => nil`
  /// * Windows:
  ///   * `C:\foo\bar                => C:\`
  ///   * `C:foo\bar                 => C:`
  ///   * `\foo\bar                  => \`
  ///   * `foo\bar                   => nil`
  ///   * `\\server\share\file       => \\server\share\`
  ///   * `\\?\UNC\server\share\file => \\?\UNC\server\share\`
  ///   * `\\.\device\folder         => \\.\device\`
  ///
  /// Setting the root to `nil` will remove the root and setting a new
  /// root will replace the root. Passing a non-root to the setter will trap.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/foo/bar"
  ///     path.root = nil // path is "foo/bar"
  ///     path.root = "/" // path is "/foo/bar"
  ///
  /// Example (Windows):
  ///
  ///     var path: FilePath = #"\foo\bar"#
  ///     path.root = nil         // path is #"foo\bar"#
  ///     path.root = "C:"        // path is #"C:foo\bar"#
  ///     path.root = #"C:\"#     // path is #"C:\foo\bar"#
  ///
  public var root: FilePath.Component? { get set }

  /// Gets or sets the relative portion of the path (everything after root).
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => foo/bar`
  ///   * `foo/bar  => foo/bar`
  ///   * `/        => ""`
  /// * Windows:
  ///   * `C:\foo\bar                  => foo\bar`
  ///   * `foo\bar                     => foo\bar`
  ///   * `\\?\UNC\server\share\file   => file`
  ///   * `\\?\device\folder\file.exe  => folder\file.exe`
  ///   * `\\server\share\file         => file`
  ///   * `\                           => ""`
  ///
  ///
  /// Setting a relative path replaces everything after the root with `newValue`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/foo/bar"
  ///     path.relativePath = "tmp/file.txt" // path is "/tmp/file.txt"
  ///     path.relativePath = ""             // path is "/"
  ///
  public var relativePath: FilePath { get set }

  /// Returns the final relative component of the path.
  /// Returns `nil` if the path is empty or only contains a root.
  ///
  /// Note: Even if the final relative component is a special directory
  /// (`.` or `..`), it will still be returned. See `lexicallyNormalize()`.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin/ => bin`
  ///   * `/tmp/foo.txt    => foo.txt`
  ///   * `/tmp/foo.txt/.. => ..`
  ///   * `/tmp/foo.txt/.  => .`
  ///   * `/               => nil`
  /// * Windows:
  ///   * `C:\Users\                    => Users`
  ///   * `C:Users\                     => Users`
  ///   * `C:\                          => nil`
  ///   * `\Users\                      => Users`
  ///   * `\\?\UNC\server\share\bar.exe => bar.exe`
  ///   * `\\server\share               => nil`
  ///   * `\\?\UNC\server\share\        => nil`
  ///
  /// Setting the basename to `nil` pops off the last relative
  /// component, otherwise it will replace it with `newValue`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/bin/ls"
  ///     path.basename = "cat" // path is "/usr/bin/cat"
  ///     path.basename = nil   // path is "/usr/bin"
  ///
  public var basename: FilePath.Component? { get set }

  /// Creates a new path with everything up to but not including the `basename`.
  ///
  /// If the path only contains a root, returns `self`.
  /// If the path has no root and only includes a single component,
  /// returns an empty FilePath.
  ///
  /// Examples:
  ///  * `/usr/bin/ls => /usr/bin`
  ///  * `/foo        => /`
  ///  * `/           => /`
  ///  * `foo         => ""`
  ///
  /// Setting the `dirname` replaces everything before `basename` with `newValue`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/bin/ls"
  ///     path.dirname = "/bin" // path is "/bin/ls"
  ///     path.dirname = ""     // path is "ls"
  ///
  public var dirname: FilePath { get set }
}

extension FilePath.ComponentView {
  /// The root component, if it exists. See `FilePath.root`.
  public var root: FilePath.Component? { get }

  /// The portion of this path after the root.
  public var relativeComponents: SubSequence { get }

  /// The final relative component of the path. Returns `nil` if the path is empty or only
  /// contains a root. See `FilePath.basename`.
  public var basename: FilePath.Component?

  /// The portion of this path with everything up to but not including the `basename`.
  public var dirname: SubSequence
}

Rationale: root and basename is of type FilePath.Component?, which is a slice type avoiding allocation. relativePath and dirname creates a new FilePath, which is far more ergonomic than a slice of the ComponentView. If slicing is needed (e.g. for inspection without allocation), the ComponentView provides slice variants of these API.

Future Work: Windows root analysis APIs that will communicate the syntactic form of the root as well as the means to extract information such as the volume or server/share.

Setters allow for in-place mutation. FilePath.root's setter allows making a path relative or absolute, and even allows switching root representations on Windows. FilePath.relativePath's setter allows a path to point to a new location from the same root (if there is one). FilePath.basename's setter allows a path point to a peer entry.

Components may be decomposed into their stem and (optional) extension (.txt, .o, .app, etc.) FilePath gains convenience APIs for dealing with the stem and extension of the basename, if it exists.

extension FilePath.Component {
  /// The extension of this file or directory component.
  ///
  /// If `self` does not contain a `.` anywhere, or only
  /// at the start, returns `nil`. Otherwise, returns everything after the dot.
  ///
  /// Examples:
  ///   * `foo.txt    => txt`
  ///   * `foo.tar.gz => gz`
  ///   * `Foo.app    => app`
  ///   * `.hidden    => nil`
  ///   * `..         => nil`
  ///
  public var `extension`: String? { get }

  /// The non-extension portion of this file or directory component.
  ///
  /// Examples:
  ///   * `foo.txt => foo`
  ///   * `Foo.app => Foo`
  ///   * `.hidden => .hidden`
  ///   * `..      => ..`
  ///
  public var stem: String { get }
}

extension FilePath {
  /// The extension of the file or directory last component.
  ///
  /// If `basename` is `nil` or one of the special path components
  /// `.` or `..`, `get` returns `nil` and `set` does nothing.
  ///
  /// If `basename` does not contain a `.` anywhere, or only
  /// at the start, `get` returns `nil` and `set` will append a
  /// `.` and `newValue` to `basename`.
  ///
  /// Otherwise `get` returns everything after the last `.` and `set` will
  /// replace the extension.
  ///
  /// Examples:
  ///   * `/tmp/foo.txt                 => txt`
  ///   * `/Appliations/Foo.app/        => app`
  ///   * `/Appliations/Foo.app/bar.txt => txt`
  ///   * `/tmp/foo.tar.gz              => gz`
  ///   * `/tmp/.hidden                 => nil`
  ///   * `/tmp/.hidden.                => ""`
  ///   * `/tmp/..                      => nil`
  ///
  /// Example:
  ///
  ///     var path = "/tmp/file"
  ///     path.extension = ".txt" // path is "/tmp/file.txt"
  ///     path.extension = ".o"   // path is "/tmp/file.o"
  ///     path.extension = nil    // path is "/tmp/file"
  ///     path.extension = ""     // path is "/tmp/file."
  ///
  public var `extension`: String? { get set }

  /// The non-extension portion of the file or directory last component.
  ///
  /// Returns `nil` if `basename` is `nil`
  ///
  ///   * `/tmp/foo.txt                 => foo`
  ///   * `/Appliations/Foo.app/        => Foo`
  ///   * `/Appliations/Foo.app/bar.txt => bar`
  ///   * `/tmp/.hidden                 => .hidden`
  ///   * `/tmp/..                      => ..`
  ///   * `/                            => nil`
  public var stem: String? { get }
}

FilePath.extension's setter allows for convenient in-place reassigning or adding/removing of an extension.

Rationale: stem and extension are expressed as Strings which are far more ergonomic than a slice of raw platform chars. Thus, these operations perform Unicode error correction, which is desirable most of the time when reading content (and setting extensions containing invalid Unicode is especially indicative of a programming error).

Rationale: FilePath.stem does not have a setter. Components are never empty and always have stems. Thus, setting a stem to "" or even nil will result in either an invalid component or a hidden file whose new name was an extension. Setting stems in this fashion is indicative of a programming error, and the desired task can be accomplished by working with basename directly.

Rationale: Components in general do not have mutating operations such as setters. Components are slice types and participate in COW; they wouldn't mutate the containing path unless part of an accessor chain. Allowing mutations would give the false impression that modifications would be published back to the containing path, e.g. while iterating the components.

Lexical operations

FilePath supports lexical (i.e. does not call into the file system to e.g. follow symlinks) operations such as normalization of special directory components (. and ..) and forming relative paths.

extension FilePath {
  /// Whether the path is in lexical-normal form, that is `.` and `..` components have
  /// been collapsed lexically (i.e. without following symlinks).
  ///
  /// Examples:
  /// * `"/usr/local/bin".isLexicallyNormal == true`
  /// * `"../local/bin".isLexicallyNormal   == true`
  /// * `"local/bin/..".isLexicallyNormal   == false`
  public var isLexicallyNormal: Bool { get }

  /// Collapse `.` and `..` components lexically (i.e. without following symlinks).
  ///
  /// Examples:
  /// * `/usr/./local/bin/.. => /usr/local`
  /// * `/../usr/local/bin   => /usr/local/bin`
  /// * `../usr/local/../bin => ../usr/bin`
  public mutating func lexicallyNormalize()

  /// Returns a copy of `self` in lexical-normal form, that is `.` and `..` components
  /// have been collapsed lexically (i.e. without following symlinks). See `lexicallyNormalize`
  public var lexicallyNormal: FilePath { get }
}

Modifying paths

FilePath supports common mutation operations.

extension FilePath {
    /// If `prefix` is a prefix of `self`, removes it and returns `true`. Otherwise
    /// returns `false`.
    ///
    /// Example:
    ///
    ///     var path: FilePath = "/usr/local/bin"
    ///     path.stripPrefix("/usr/bin")   // false
    ///     path.stripPrefix("/us")        // false
    ///     path.stripPrefix("/usr/local") // true, path is "bin"
    ///
    public mutating func stripPrefix(_ prefix: FilePath) -> Bool

    /// Push each component of `other`. If `other` has a root, replaces `self` with
    /// other.
    ///
    /// Example:
    ///
    ///     var path: FilePath = "/tmp"
    ///     path.append("dir/file.txt") // path is "/tmp/dir/file.txt"
    ///     path.append("/bin")         // path is "/bin"
    ///
    public mutating func append(_ other: FilePath)

    /// If `other` is a relative path component, pushes it onto the end of `self`.
    /// If `other` is a root, replaces `self` with `other`.
    ///
    /// Example:
    ///
    ///     var path: FilePath = "/tmp"
    ///     path.pushLast("dir")      // path is "/tmp/dir"
    ///     path.pushLast("file.txt") // path is "/tmp/dir/file.txt"
    ///     path.pushLast("/")        // path is "/"
    ///
    public mutating func pushLast(_ other: FilePath.Component)

    /// Remove the last component of this file path. If the path is
    /// root or empty, does nothing and returns false.
    ///
    /// To see the component that will be popped, use `basename`. `popLast()` is
    /// equivalent to setting `basename` to `nil`.
    ///
    /// Examples:
    /// * `"/".popLast()        == false // path is "/"`
    /// * `"/foo/bar".popLast() == true  // path is "/foo"`
    ///
    public mutating func popLast() -> Bool

    /// Remove the contents of the path, keeping the null terminator.
    public mutating func removeAll(keepingCapacity: Bool = false)

    /// Reserve enough storage space to store `minimumCapacity` `PlatformChar`s.
    public mutating func reserveCapacity(_ minimumCapacity: Int)
}

Rationale: popLast does not return the component, as components are slices of FilePath's underlying storage. Returning a popped component would trigger a COW copy.

Paths and strings

Just like FilePath, FilePath.Component can be decoded/validated into a Swift String.

extension String {
  /// Creates a string by interpreting the path component's content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this initializer replaces invalid bytes them with U+FFFD.
  /// This means that, depending on the semantics of the specific file system,
  /// conversion to a string and back to a path component
  /// might result in a value that's different from the original path.
  public init(decoding component: FilePath.Component)

  /// Creates a string from a path component, validating its contents as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the contents of the path component isn't a well-formed Unicode string,
  /// this initializer returns `nil`.
  public init?(validating component: FilePath.Component)
}

FilePath and FilePath.Component gain convenience properties for viewing their content as Strings.

extension FilePath {
  /// Creates a string by interpreting the path’s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: path)`.
  public var string: String

  /// Creates an array of strings representing the components of this
  /// path. Interprets the file path’s content as UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// If the content of the path isn't a well-formed Unicode string,
  /// this replaces invalid bytes them with U+FFFD.
  public var componentStrings: [String]
}

extension FilePath.Component {
  /// Creates a string by interpreting the component’s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: component)`.
  public var string: String
}

Rationale: While System strongly encourages the use of strong types for handling paths and path operations, systems programming has a long history of using weakly typed strings as paths. These properties enable more rapid prototyping and easier testing while being far more discoverable and ergonomic than the corresponding String initializers. This API (anti)pattern is to be used sparingly.

Separators are always normalized

FilePath now normalizes directory separators on construction and maintains this invariant across mutations. In the relative portion of the path, FilePath will strip trailing separators and coalesce repeated separators.

  FilePath("/a/b/") == "/a/b"
  FilePath("a///b") == "a/b"

Rationale: Normalization provides a simpler and safer internal representation. For example, a trailing slash can give the false impression that the last component is a directory, leading to correctness and security hazards.

Windows accepts either forwards slashes (/) or backslashes (\) as directory separators, though the platform's preferred separator is backslash. On Windows, FilePath normalizes forwards slashes to backslashes on construction. Backslashes after a UNC server/share or DOS device path's volume are treated as part of the root.

  FilePath("C:/foo/bar/") == #"C:\foo\bar"#
  FilePath(#"\\server\share\folder\"#) == #"\\server\share\folder"#
  FilePath(#"\\server\share\"#) == #"\\server\share\"#
  FilePath(#"\\?\volume\"#) == #"\\?\volume\"#

Future Work: Ability to print out Windows paths in a specified format rather than using the platform-preferred separator. E.g. a so-called "portable" or "generic" form ala C++17 or perhaps encoded ala RFC-1808.

Wide and narrow characters

Unix paths are represented as contiguous CChars in memory and convert to a String by validating as UTF-8. Windows paths are represented as contiguous UInt16s in memory and are converted to a String by validating as UTF-16. Either platform may have invalid Unicode content, which only affects the conversion to Swift's Unicode-correct String type (i.e. it does not affect the semantics of other FilePath operations).

To aid readability and make it easier to write code agnostic to the platform's character-width, we introduce typealiases for the platform's preferred character and Unicode encoding.

/// The platform's preferred character type. On Unix, this is an 8-bit `CChar` (which
/// may be signed or unsigned, depending on platform). On Windows, this is
/// `UInt16` (a "wide" character).
#if os(Windows)
public typealias PlatformChar = UInt16
#else
public typealias PlatformChar = CChar
#endif

/// The platform's preferred Unicode encoding. On Unix this is UTF-8 and on Windows
/// it is UTF-16. Native strings may contain invalid Unicode,
/// which will be handled by either error-correction or failing, depending on API.
#if os(Windows)
public typealias PlatformUnicodeEncoding = UTF16
#else
public typealias PlatformUnicodeEncoding = UTF8
#endif

String, FilePath, and FilePath.Component gain "escape hatch" APIs for C interoperability using these typealiases.

extension String {
  /// Creates a string by interpreting the null-terminated platform string as
  /// UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// - Parameter platformString: The null-terminated platform string to be
  ///  interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the content of the platform string isn't well-formed Unicode,
  /// this initializer replaces invalid bytes them with U+FFFD.
  /// This means that, depending on the semantics of the specific platform,
  /// conversion to a string and back might result in a value that's different
  /// from the original platform string.
  public init(platformString: UnsafePointer<PlatformChar>)

  /// Creates a string by interpreting the null-terminated platform string as
  /// UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// - Parameter platformString: The null-terminated platform string to be
  ///  interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the contents of the platform string isn't well-formed Unicode,
  /// this initializer returns `nil`.
  public init?(validatingPlatformString: UnsafePointer<PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the string,
  /// represented as a null-terminated platform string.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath {
  /// Creates a file path by copying bytes from a null-terminated platform string.
  ///
  /// - Parameter platformString: A pointer to a null-terminated platform string.
  public init(platformString: UnsafePointer<PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the file path,
  /// represented as a null-terminated platform string.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath.Component {
  /// Creates a file path component by copying bytes from a null-terminated platform string.
  ///
  /// - Parameter string: A pointer to a null-terminated platform string.
  public init(platformString: UnsafePointer<PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the file path component,
  /// represented as a null-terminated platform string.
  ///
  /// If this is not the last component of a path, an allocation will occur in order to
  /// add the null terminator.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<PlatformChar>) throws -> Result
  ) rethrows -> Result
}

Future Work: Make the currently-internal SystemString public. SystemString handles the stringy implementation of FilePath, and may be useful to expose on its own.

Future Work: Windows-only APIs for widening narrow ASCII-only native strings, and narrowing ASCII content, for compatibility reasons.

Rejected or deferred alternatives

Deferred: Introduce RelativePath and AbsolutePath

FilePath is the most faithful surfacing of the systems programming concept of a path in Swift, and is the right type to provide to end-point developers as well. Libraries and tools built on top of System raise some notion of "canonical" paths to type-level salience, and this often falls out as absolute vs relative paths.

While System is strongly in favor of strong types and enabling libraries and tools built on top of System to use stronger types, absolute vs relative is not the only potential top-level distinction:

  • Lexically-normalized absolute is cheap to compute and check (E.g. SwiftPM's AbsolutePath type).
  • Semantically-normal (i.e. expanding symlinks and environment variables) may be more important.
  • Equivalency-normal, which includes semantically-normal plus things like Unicode normalization, case-folding, etc., (such that path equality is binary equality) could be desired for security and performance reasons.

Additionally, each specific tool and library may have a slightly different notion of "absolute". For example, some tools might consider pre-shell-expansion of ~ to be a valid start to an "absolute" path for their purposes.

We're deferring adding RelativePath, AbsolutePath, and any normalized or canonical variants until this design space is better understood. There's a chance that a future System will add a common protocol for such types. For now, libraries and tools can define strongly-typed wrappers which check their preconditions on initialization.

Considering: Alternate names to basename, dirname, and popLast()

Unix uses basename and dirname to refer to the last relative path component and everything up to the last relative path component, respectively. These names apply syntactically and do not treat the special directory components . and .. specially.

For example, the (basename, dirname) of /usr/bin/ls/. would decompose into (/usr/bin/ls, .). "Dir" in dirname would be a misnomer as ls might not be a directory.

Alternate names could be lastRelativeComponent for basename and parent for dirname. popLast() could instead be named removeLastRelativeComponent(). lastRelativeComponent and removeLastRelativeComponent() are very verbose, but they do precisely describe the semantics of the operation rather than rely on Unix precedent. The use of parent instead of dirname avoids any implication that the result is definitely a directory, though it may still imply a false parent-child relationship when the last component is . or ...

We are going with basename and dirname as concise terms with technical precedent for their precise semantics. We're very interested in alternatives and feedback. Another alternative could be to only replace dirname with parent.

C++17 provides filename and parent_path. In our opinion, the use of the word "file" gives a stronger, more widespread and dangerous intuition about the status of the result. C++17 also treats trailing separators as significant while FilePath strips trailing separators to avoid all kinds of issues.

Rust provides file_name and parent. Again, we feel the use of the word "file" is hazardous. Rust will return nil for file_name for paths ending in either special directories .. or ., however parent will happily gobble them up, meaning that (file_name, parent) is not a decomposition of the path.

Another alternative could be to split basename into two APIs, one which follows the Unix semantics of basename and one which will exclude special directories. Similarly for dirname. Thus one pair of API is a path decomposition and the other follows something similar to Rust's behavior.

Considering: "Root" only refers to a separator, does not include Windows volumes

As proposed, the term "root" on Windows contains everything up to and including the directory separator. This provides a decomposition of a path into a (root?, relative), where relative can be empty.

C#'s GetPathRoot() will return the Windows volume information (e.g. drive letter and colon) in addition to the separator. Windows seems to consider this information part of the "root".

C++17 uses the term "root" similarly, and provides a decomposition into "root name" and "root directory". C++17 does not give any further help parsing the "root name", so this decomposition is fairly trivial. Since FilePath normalizes directory separators, there is little value from providing this decomposition, instead we'd like to provide richer APIs.

Future Windows-only API includes the ability to inspect the syntactic form of the root and extract volume information. This is prototyped here as two enums: one for the syntatic form (e.g. Traditional DOS vs DOS device syntax) and one to get the volume information (e.g. drive letter or UNC server/share).

Rust uses the term "root" to refer to the directory and "prefix" (term original to Rust, AFAICT) to refer to everything prior to it. Rust provides a decomposition of Windows prefixes into their syntactic form containing volume information. Our prototype is heavily inspired by Rust, but we separate syntactic form from the volume information, which we feel could be cleaner and easier to use.

An alternative could be a path decomposition into (windowsPrefix?, root?, relative) on Windows and (root?, relative) on Unix. However, we feel this is more likely to result in code maladapted for Windows than the approach proposed.

Source and ABI stability impact

API changes are strictly additive.

Separator normalization does not affect the semantics of path operations. It can change how paths are printed, compared, and hashed (this proposal argues these changes are for the better).

Deprecations

A handful of APIs have been deprecated in favor of better-named alternatives.

extension FilePath {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)

  @available(*, deprecated, renamed: "init(platformString:)")
  public init(cString: UnsafePointer<CChar>)

  @available(*, deprecated, renamed: "withPlatformString(_:)")
  public func withCString<Result>(
    _ body: (UnsafePointer<CChar>) throws -> Result
  ) rethrows -> Result
}
extension String {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment