Skip to content

Instantly share code, notes, and snippets.

@atrick
Last active March 28, 2023 18:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save atrick/507d15d0ccbe093bb91057caf6263f07 to your computer and use it in GitHub Desktop.
Save atrick/507d15d0ccbe093bb91057caf6263f07 to your computer and use it in GitHub Desktop.
BufferView Proposal

BufferView Proposal

Introduction

Building Swift system APIs requires a common data type for efficiently viewing contiguous memory as a series of typed elements. This proposal introduces BufferView<T> and MutableBufferView<T> as lowest-common-denominator types that can be used across many low-level APIs without sacrificing safety, efficiency, and generality.

The closest alternative, UnsafeBufferPointer<T>, is not ideal for several reasons:

  • it is ownership unsafe, leading to use-after-free security holes

  • it is bounds unsafe, leading to buffer overflow security holes (although this can be fixed independently)

  • it is inefficient, requiring optional unwrapping on access

  • it is invalid for viewing a raw byte buffer while reinterpreting those bytes as element types. Viewing a byte buffer as UnsafeBufferPointer<T> could expose undefined behavior due to strict aliasing in both C and Swift. System APIs should not force the client to manage the buffer's bound type.

The most important aspect that BufferView improves on is ownership-safety. Doing this does, however, require new language features. Those features will be proposed separately, but will be briefly introduced here to demonstrate their relation to BufferView.

Use Cases

BufferView may represent any contiguous memory as a series of typed elements. It allows access to any of these owned storage types:

  • Array contents, exclusively for mutable views

  • "Unique buffers" with move-only ownership, exclusively for mutable views

  • "Managed buffers" with reference semantics

  • Automatically stack-allocated "array" types

  • Pointers into local variables, including homogenous tuples (a.k.a. statically-sized arrays)

  • Pointers into manually heap or stack-allocated storage

Design Requirements

Efficient across resilience boundaries BufferView's representation is non-generic. Computing offsets only requires knowing the element type's stride. Specialization can be shared for most types.

Pointer-type safe BufferView is backed by a raw pointer. It is well-defined and type-safe to vend both [Mutable]BufferView<T> and [Mutable]BufferView<U> over the same memory.

Safely self-slicing Supports API composability. Pointer-like indices don't need to be rebased.

Efficient access Subscripting does not require optional pointer unwrapping.

C interoperability BufferView will vend withUnsafe[Mutable]Bytes and withUnsafe[Mutable]BufferPointer APIs, analogous to Array. Array-like implicit conversion is also possible, and it would be safe for conversion to Unsafe[Mutable]RawPointer. But implicit conversion to a typed pointer opens up the possibility for strict aliasing violations on the C side. Instead, an explicit "unsafe" construct should occur whenever the programmer introduces possible undefined behavior.

Ownership safety via nonescaping

An API that accepts a BufferView is independent of where the buffer originates and who owns it. This is safe as long as it's possible to guarantee that the BufferView does not "escape" the invocation of the API. In other words, after the API returns, the caller regains full control over the view's ownership.

Enforcing this property is best accomplished with new language support for @nonescaping types. For such types, the compiler will conservatively enforce that the lifetime of values is limited by their declaration scope. Consider the following BufferView-based API examples:

Function scope:

// The compiler ensures that `view` cannot escape this function scope
func processBuffer(view: BufferView<Element>) {
  ...
}

Closure scope via closure-taking method:

// `$0` cannot escape this closure scope
array.withBufferView { processBuffer(view: $0) }

Vending a view into a buffer with value semantics requires either acquiring readonly access to a variable that owns the buffer or requires retaining a copy of the underlying buffer storage for the duration of the view's lifetime.

Exclusive closure scope via mutating method:

// `$0` cannot escape this closure scope
array.withMutableBufferView { processBuffer(view: $0) }

In the case of vending a MutableBufferView, the array must perform a uniqueness check and potentially trigger a copy.

With future language and compiler support, APIs can also be designed that return non-escaping types. "ref-bindings" and "inout-bindings" will be proposed as part of the move-only values initiative.

Ref-binding scope:

// Begins a readonly access to `array`
ref bufferView = array.bufferView
...
// Force-copies the array storage
let arrayCopy = _copy(array)
...
processBuffer(view: bufferView)
// Readonly access to `array` ends after its last use

Inout-binding scope:

// Begins an exclusive modification to `array`
inout bufferView = array.mutableBufferView
...
fillBuffer(view: bufferView)
// Exclusive access to `array` ends after its last use

@nonescaping enforcement

Annotating BufferView as @nonescaping ensures that the non-escaping property is automatically enforced for every declaration of type BufferView. This extends to any type composed from a BufferView. But when a BufferView is passed to a generic or a protocol parameter, its non-escaping property is no longer carried by the type. For safety, the compiler must ensure that the non-escaping property surives such type conversions.

When an API refers to a polymorphic type, the compiler can enforce @nonescaping arguments at the API boundary in any of these ways:

  • A @nonescaping parameter annotation

  • An @_effects(nonescaping) annotation, which is unsafe

  • Automatic compiler analysis of always-emit-into-client code

Note that existing generic APIs will need to be retroactively annotated for compatibility with non-escaping values For example:

extension Collection {
  @nonescaping
  func map<T>(_ transform: (Element) throws -> T) rethrows -> [T]
}

A withoutActuallyEscaping API allows the programmer to bypass compiler enforcement. While such a bypass is possible with @nonescaping enforcement, it is not possible with @moveonly enforcement.

Feature dependencies

BitwiseCopyable

BufferView exposes some APIs that are only safe for bitise-copyable (a.k.a. trivial) values. A bitwise-copyable value does not require construction or destruction.

Reference Implementation

// swiftc -parse-stdlib ./BufferView.swift -Xfrontend -disable-access-control

import Swift

// Any bit pattern would suffice as an invalid pointer;
// as long as no "spare bits" are used.
// For convenience, this bit pattern
// - is not a pattern that would be used by memory smashers
// - has repetitive digits that don't look like garbage
// - is less than a page, so it can never be a real pointer
// - reserves as many low bits as possible so it can't be a tagged pointer
//
// TODO: define this in the ABI header, it is platform-specific.
let _invalidRawPointer = Builtin.inttoptr_Word(0x440._builtinWordValue)

// Index into a buffer.
// This is never directly accessed, and therefore agnostic to mutability.
public struct BufferIndex<T> {
  typealias Pointee = T

  let _rawValue : Builtin.RawPointer

  public init(_ rawValue: Builtin.RawPointer, as: T.Type) {
    self._rawValue = rawValue
  }
  public init(_ rawPointer: UnsafeRawPointer, as: T.Type) {
    self._rawValue = rawPointer._rawValue
  }
}

// View a byte buffer as a sequence of Elements. Self-slicing.
//
// TODO: Support conversion to a non-mutable BufferView.
@nonescaping
public struct MutableBufferView<T> : RandomAccessCollection {
  public typealias Element = T
  public typealias Index = BufferIndex<Element>

  // Note: Always build raw memory containers on top of
  // Unsafe[Mutable]RawPointer, not Unsafe[Mutable]RawBufferPointer,
  // which requires two Optional unwrapping checks on every access!
  public let rawPointer: UnsafeMutableRawPointer
  public let count: Int

  func _getIndex(_ rawPointer: UnsafeMutableRawPointer) -> Index {
    return BufferIndex(rawPointer, as: Element.self)
  }

  func _getPointer(_ index: BufferIndex<Element>) -> UnsafeMutableRawPointer {
    return UnsafeMutableRawPointer(index._rawValue)
  }

  public init(as: Element.Type) {
    self.rawPointer = UnsafeMutableRawPointer(_invalidRawPointer)
    self.count = 0
  }

  public init(reinterpreting rawPointer: UnsafeMutableRawPointer, count: Int, as: Element.Type) {
    self.rawPointer = rawPointer
    self.count = count
  }

  public init(reinterpreting rawBytes: UnsafeMutableRawBufferPointer, as: Element.Type) {
    assert(rawBytes.baseAddress != nil || rawBytes.count == 0)
    self.rawPointer =
      rawBytes.baseAddress ?? UnsafeMutableRawPointer(_invalidRawPointer)
    self.count = rawBytes.count / MemoryLayout<Element>.stride
    precondition(self.count * MemoryLayout<Element>.stride == rawBytes.count)
    precondition(Int(bitPattern: rawBytes.baseAddress).isMultiple(of: MemoryLayout<Element>.alignment))
  }

  public var startIndex: Index { _getIndex(rawPointer) }

  public var endIndex: Index { startIndex.advanced(by: count) }

  public func checkBounds(_ position: Index) {
    precondition(position >= startIndex)
    precondition(position < endIndex)
  }

  public subscript(unchecked index: Index) -> Element {
    get {
      _getPointer(index).load(as: Element.self)
    }
    nonmutating set(newValue) {
      _getPointer(index).storeBytes(of: newValue, as: Element.self)
    }
  }
  // Unlike Unsafe[Mutable]RawBufferPointer, subscripts should be
  // bounds-checked by default in release builds.
  public subscript(index: Index) -> Element {
    get {
      checkBounds(index)
      return self[unchecked: index]
    }
    nonmutating set(newValue) {
      self[unchecked: index] = newValue
    }
  }
}

// =============================================================================
// Detail

// Pointer-like behavior
extension BufferIndex: Comparable, Hashable, Strideable {
  public static func == (lhs: Self, rhs: Self) -> Bool {
    return Bool(Builtin.cmp_eq_RawPointer(lhs._rawValue, rhs._rawValue))
  }
  public static func < (lhs: Self, rhs: Self) -> Bool {
    return Bool(Builtin.cmp_ult_RawPointer(lhs._rawValue, rhs._rawValue))
  }
  public static func >= (lhs: Self, rhs: Self) -> Bool {
    return Bool(Builtin.cmp_uge_RawPointer(lhs._rawValue, rhs._rawValue))
  }

  public func advanced(by n: Int) -> Self {
    return Self(Builtin.gep_Word(_rawValue, n._builtinWordValue, T.self),
      as: T.self)
  }

  public func distance(to end: Self) -> Int {
    return
      Int(Builtin.sub_Word(Builtin.ptrtoint_Word(end._rawValue),
                           Builtin.ptrtoint_Word(_rawValue)))
      / MemoryLayout<Pointee>.stride
  }

  public func hash(into hasher: inout Hasher) {
    hasher.combine(UInt(bitPattern: UnsafeRawPointer(_rawValue)))
  }
}

Key aspects of the reference implementation

Self-slicing requires a pointer-based index:

public struct BufferIndex<T> {
    let _rawValue : Builtin.RawPointer

public struct MutableBufferView<T> : RandomAccessCollection {
  public typealias Element = T
  public typealias Index = BufferIndex<Element>

No optional pointer unwrapping on access:

  public let rawPointer: UnsafeMutableRawPointer
  public let count: Int

Typed view over raw memory:

  public init(reinterpreting rawPointer: UnsafeMutableRawPointer, as: Element.Type, count: Int)
  public init(reinterpreting rawBytes: UnsafeMutableRawBufferPointer, as: Element.Type)

Bounds-checking by default in all builds:

  public subscript(unchecked index: Index) -> Element
  public subscript(index: Index) -> Element

Reference

WWDC20 talk Safely Manage Pointers in Swift

Most of the API design requirements arose from a design discussion with Michael Ilseman and Guillaume Lessard. A more involved prototype from June 21, 2021 circulated privately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment