Skip to content

Instantly share code, notes, and snippets.

@atrick
Last active October 14, 2022 21:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save atrick/bb4777d9731df92dfc7d71d1aae27a99 to your computer and use it in GitHub Desktop.
Save atrick/bb4777d9731df92dfc7d71d1aae27a99 to your computer and use it in GitHub Desktop.

Constrain implicit raw pointer conversion to trivial values

Introduction

Swift supports implicit inout conversion to unsafe pointers. A mutable variable can be passed inout to a function that receives an unsafe pointer to the value. According to the normal rules for pointer conversion, this naturally extends to raw pointers. Raw pointers allow the pointer-taking function to operate directly on the value's bytes.

func readBytes(_ pointer: UnsafeRawPointer) {
  assert(pointer.load(as: UInt16.self) == 0xaaaa)
}
    
var x: UInt16 = 0xaaaa
readBytes(&x)

A more sophisticated form of implicit pointer conversion allows Swift Array and String values to be passed directly to pointer-taking functions. The callee receives a pointer to the contiguously stored elements.

let array = [UInt16(0xaaaa)]
readBytes(array)

These two features are a dangerous combination: (1) the expectation that collections can be implicitly converted to their elements, along with (2) the raw pointer conversion rule. Not all collections support the same style of implicit conversion to elements, and not all support the same variations. When the intended conversion isn't available, the compiler instead exposes the data type's internal representation without warning. Balancing safety with C interoperability usability is often difficult, but in this case we can achieve much greater safety without sacrificing much overall usability. Given the recent improvements to pointer in Swift 5.7, and with a major language update on the horizon, this good time to correct this behavior.

Motivation

Here, the user likely wants to inspect the contents of a string, but instead they've leaked the internal representation:

func inspectString(string: inout String) {
  readBytes(&string) // reads the string's internal representation
}

This is a pernicious security issue because the code will happen to work during testing for small strings. Removing the '&' sigil changes the string conversion into an array-like conversion:

func inspectString(string: inout String) {
  readBytes(string) // reads the string's characters
}

In the next example, the author clearly expected Foundation.Data to have the same sort of implicit conversion support as Array:

func foo(data: inout Data) {
  readBytes(&data)
}

This compiles without error, but it unintentionally exposes data object's internal storage representation rather than its elements.

Swift 5.7 generalized the type checker's handling of raw pointers to include C functions that take char *. This makes the dangerous combination easier to stumble into:

void read_char(char *input);
read_char(&string) // wrong

The problem of accidentally casting String and Foundation.Data to a raw pointer is a dramatic example of a much broader problem. A type that contains no class references is referred to as "trivial". Casting trivial types to raw pointers is reasonable, and has always been supported. Implicit conversion from a non-trivial type to a raw pointer, however, is extremely dangerous, and almost always accidental:

var object: AnyObject = ...
readBytes(&object)

Proposed Solution

We will introduce a new diagnostic that warns on implicit inout conversion of a non-trivial value to a raw pointer.

warning: cannot implicitly convert an inout value of type 'T' to expected argument type 'UnsafeRawPointer' because its type may contain an object reference.

Based on user feedback from this warning, we intend to convert the diagnostic to an error in Swift 6.

Generic types are conservatively considered non-trivial, so the following example will produce a warning:

func inoutGeneric<T>(t: inout T) {
  readBytes(&t)  // warning: cannot implicitly convert an inout value of type 'T' ...
}

To mitigate source breakage, conversion of generic types that conform to FixedWidthInteger will be allowed.

func inoutGeneric<T: FixedWidthInteger>(t: inout T) {
  readBytes(&t)  // no warning
}

This will handle common generic cases that rely on inout to raw pointer conversion, such as Unicode.Encoding.CodeUnit and swift-nio's ByteBuffer API. It is safe to assume that fixed-width integers are bitwise-copyable.

Detailed Design

Given:

func readBytes(_ pointer: UnsafeRawPointer) {...}
func writeBytes(_ pointer: UnsafeRawPointer) {...}

// Let T be a statically non-trivial type that
// does not conform to ExpressibleByIntegerLiteral...

The new diagnostic will disallow each of the following calls to readBytes or writeBytes:

var t: T = ... 
readBytes(&t)
writeBytes(&t)

let array: [T] = ...
readBytes(t)

var array: [T] = ...
readBytes(&t)
writeBytes(&t)

var string: String = ...
readBytes(&string)
writeBytes(&string)

var data: Data = ...
readBytes(&data)
writeBytes(&data)

Implementation: PR #60616

Source compatibility

This is a source breaking change. The new diagnostic will only break code that views the raw bytes of a class reference or a generic type. The warning will be introduced immediately. If the community feedback is not overwhelmingly negative, it will become an error in Swift 6 mode.

Workarounds for common cases

Users can silence the warning using either withUnsafePointer or withUnsafeBytes as follows. Note that these are all extremely dangerous use cases that are not generally supported, but work in practice under specific conditions. Those conditions are out of scope for this proposal, but they are the same regardless of whether the code relies on implicit conversion or uses the workarounds below...

To pass the address of an internal stored property through an opaque pointer (unsupported but not uncommon):

class TestStoredProperty {
  var property: AnyObject? = nil

  func testPropertyId() {
    withUnsafePointer(to: &property) {
      take_opaque_pointer($0)
    }
  }
}

To pass a class reference through an opaque pointer:

// C decl
// void take_opaque_pointer(void *);

let object: AnyObject = ...
take_opaque_pointer(Unmanaged.passUnretained(object).toOpaque());

To expose the bitwise representation of class references:

func readBytes(_ pointer: UnsafeRawPointer) {...}

withUnsafePointer(to: object) {
  readBytes($0.baseAddress!)
}

Associated object String keys

Associated objects are strongly discouraged in Swift and may be deprecated. Nonetheless, legacy code can't always be redesigned at the time of a language update. We offer some quick workarounds here.

Code that attempts to take the address of a String as an associated object key will now raise a type conversion error:

import Foundation
 
class Container {
  static var key = "key"

  func getObject() -> Any? {
    return objc_getAssociatedObject(self, &Container.key)
  }
}

This can be rewritten using the direct, low-level withUnsafePointer workaround:

class Container {
  static var key = "key"

  func getObject() -> Any? {
    withUnsafePointer(to: Container.key) {
      return objc_getAssociatedObject(self, $0)
    }
  }
}

Alternatively, you can use the key's object identity rather the address of the property. But this only works with objects that require separate allocation at instantiation. Neither NSString, nor NSNumber can be safely used. NSObject is a safe bet:

class Container {
  static var key = NSObject()

  func getID(_ object: AnyObject) -> UnsafeRawPointer {
    return UnsafeRawPointer(Unmanaged.passUnretained(object).toOpaque())
  }
  func getObject() -> Any? {
    return objc_getAssociatedObject(self, getID(Container.key3))
  }
}

A safer and more principled approach would be to use a general purpose property wrapper to define unique, pointer-sized keys:

@propertyWrapper
struct UniqueAddress {
  var _placeholder: Int8 = 0
 
  var wrappedValue: UnsafeRawPointer {
    mutating get {
      // This is "ok" only as long as the wrapped property appears
      // inside of something with a stable address (a global/static
      // variable or class property) and the pointer is never read or
      // written through, only used for its unique value
      return withUnsafeBytes(of: &self) {
        return $0.baseAddress.unsafelyUnwrapped
      }
    }
  }
}

class Container {
  @UniqueAddress static var key
  
  func getObject() -> Any? {
    return objc_getAssociatedObject(self, Container.key)
  }
}

Effects on ABI stability and API resilience

None

Alternatives considered

Narrow, type-specific diagnostics

We could add an attribute to Data and other copy-on-write containers to more selectively suppress implicit conversion. This approach would miss an opportunity to significantly improve safety of the language.

Forbid implicit inout conversion to raw pointers

We could simply forbid implicit inout conversion to raw pointers. That would be a simpler and more consistent language rule. But a strict rule would break all current legitimate uses of inout-to-raw-pointer conversion, forcing the use of a closure-taking API for C interoperability. For example, this simple C interoperability case would no longer work:

char c_buffer[10]; // C header

read_char(&c_buffer) // called from Swift code

There are already published examples of Swift-to-C interoperability that rely on this feature, similar to the example from the introduction:

var x: UInt16 = 0xaaaa
readBytes(&x)

[QUESTION] Are function pointers ever legitimately be passed as void *. We have in tree tests like this:

func takesMutableVoidPointer(_ x: UnsafeMutableRawPointer) {}
var f: () -> () = {}
takesMutableVoidPointer(&f)

Forbid implicit Array and String conversion to raw pointers

This does not solve the central problem of inout conversion to raw pointers, which is in fact the dangerous case. It would break C interoperability in some cases when using an Array as a byte buffer or viewing a String's null-terminated UTF8 representation as raw bytes.

Future work

Add a BitwiseCopyable marker protocol

Various standard API's require bitwise copyability, but there is no way to express the requirement. Instead, we use dynamic _isPOD() assertions. For years, there has been strong consensus that this should be handled by a marker protocol. Having such a marker protocol would give users a way to work around the new conversion restrictions in generic code. The following conversion would now be valid:

func foo<T: BitwiseCopyable>(_: T.Type) {
  var t: T = ...
  readBytes(&t)

  let array: [T] = ...
  readBytes(t)

  var array: [T] = ...
  readBytes(&t)
}

Forbid withUnsafeBytes(of:_:) for nontrivial or CoW types

The following use cases should be illegal similar to their inout conversion counterparts. When programmers ask for the "bytes" of a collection, they almost certainly wanted to view the elements. Rather than leak the internal representation of the collection.

var string: String = ...
withUnsafeBytes(of: &string) {...}

var array: [T] = ...
withUnsafeBytes(of: &array) {...}

var set: Set<T> = ...
withUnsafeBytes(of: &set) {...}

var data: Data = ...
withUnsafeBytes(of: &data) {...}

This can be done with an overload:

@available(swift, deprecated: 6, message: "either use withUnsafePointer(of:) to point to a value containing object references, or use a method to retrieve the contents of a container")
public func withUnsafeBytes<T: BitwiseCopyable, Result>(
  of value: inout T,
  _ body: (UnsafeRawBufferPointer) throws -> Result
) rethrows -> Result {...}

By adding a new CopyOnWriteValue marker protocol with conformances from String, Array, Set, and Data, this restriction could be narrowed to diagnose only the most confusing cases shown above without affecting all nontrivial values.

@available(swift, deprecated: 6, message: "use a method to retrieve the contents of a container")
public func withUnsafeBytes<T: CopyOnWriteValue, Result>(
  of value: inout T,
  _ body: (UnsafeRawBufferPointer) throws -> Result
) rethrows -> Result {...}

Acknowledgments

Thanks to Robert Widmann, Mike Ash for advice on handling associated objects and Joe Groff for suggesting the property wrapper workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment