Skip to content

Instantly share code, notes, and snippets.

@atrick
Last active February 22, 2023 22:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save atrick/0249054b13d4964fd9f62fd48df80b48 to your computer and use it in GitHub Desktop.
Save atrick/0249054b13d4964fd9f62fd48df80b48 to your computer and use it in GitHub Desktop.
Constrain implicit raw pointer conversion to bitwise-copyable values

Constrain implicit raw pointer conversion to bitwise-copyable values

Introduction

This proposal adds restrictions on implicit casts to unsafe pointers. It allows implicit casts only when the source value is either "bitwise copyable" or has custom support for unsafe pointer casts. A type is bitwise copyable if copying a value of that type requires nothing more than copying each bit in its representation (i.e. memcpy). Bitwise copyable types do not require deinitialization. Notably, bitwise copyable types cannot contain object references. Raw pointers are primarily intended for use with bitwise-copyable types because it is extremely difficult and dangerous to work with raw bytes in the presence of object references.

The implicit raw pointer casts that we want to prohibit commonly happen accidentally, without any source-level indication of unsafety. They nonetheless pose a serious safety and security concern. They are a common source of programming mistakes leading to runtime crashes but have no discernable benefit.

Motivation

Swift supports implicit inout conversion to unsafe pointers, meaning that a mutable variable can be passed inout to a function that receives an unsafe pointer to the value. According to the normal rules for pointer conversion, this naturally extends to raw pointers. Raw pointers allow the pointer-taking function to operate directly on the value's bytes.

func readBytes(_ pointer: UnsafeRawPointer) {
  assert(pointer.load(as: UInt16.self) == 0xaaaa)
}
    
var x: UInt16 = 0xaaaa
readBytes(&x)

Another higher-level form of implicit pointer conversion allows Swift Array and String values to be passed directly to pointer-taking functions. Here, the callee receives a pointer to the object's contiguously stored elements, rather than to the object itself:

let array = [UInt16(0xaaaa)]
readBytes(array)

These two features are a dangerous combination. The Array and String conversion feature sets up an expectation that collection data types can, in general, be implicitly converted to their elements. When a programmer attempts to make use of this feature for non-Array, non-String data types, they can easily fall into the more general inout conversion behavior instead. Rather than creating a pointer to the collection's elements, the compiler instead exposes the data type's internal representation without warning.

Here, the user likely wants to inspect the contents of a string, but instead they've leaked the internal representation:

func inspectString(string: inout String) {
  readBytes(&string) // reads the string's internal representation
}

This is a pernicious security issue because the code will happen to work during testing for small strings. Removing the '&' sigil changes the string conversion into an array-like conversion:

func inspectString(string: inout String) {
  readBytes(string) // reads the string's characters
}

In the next example, the author clearly expected Foundation.Data to have the same sort of implicit conversion support as Array:

func foo(data: inout Data) {
  readBytes(&data)
}

This compiles without error, but it unintentionally exposes data object's internal storage representation rather than its elements.

Swift 5.7 generalized the type checker's handling of raw pointers to include C functions that take char *. This makes the dangerous combination easier to stumble into:

void read_char(char *input);
read_char(&string) // wrong

The problem of accidentally casting String and Foundation.Data to a raw pointer is a dramatic example of a much broader problem. A type that contains no class references is referred to as bitwise copyable, because copying its value is equivalent to calling memcpy--the copy operations does not carry any type-specific semantics. Casting bitwise-copyable types to raw pointers is reasonable, and has always been supported. Implicit conversion from a non-bitwise-copyable type to a raw pointer, however, is extremely dangerous, and almost always accidental:

var object: AnyObject = ...
readBytes(&object)

Balancing safety with C interoperability usability is often difficult, but in this case we can achieve much greater safety without sacrificing much overall usability. Given the recent improvements to pointer in Swift 5.7, and with a major language update on the horizon, this good time to correct this behavior.

Proposed Solution

We propose introducing a new diagnostic that warns on implicit inout conversion of a non-bitwise-copyable value to a raw pointer.

warning: cannot implicitly convert an inout value of type 'T' to expected argument type 'UnsafeRawPointer' because 'T' may contain an object reference.

Based on user feedback from this warning, we intend to convert the diagnostic to an error in Swift 6.

Generic types are conservatively considered non-bitwise-copyable, so the following example will produce a warning:

func inoutGeneric<T>(t: inout T) {
  readBytes(&t)  // warning: cannot implicitly convert an inout value of type 'T' ...
}

To mitigate source breakage, conversion of generic types that conform to FixedWidthInteger will be allowed.

func inoutGeneric<T: FixedWidthInteger>(t: inout T) {
  readBytes(&t)  // no warning
}

This will handle common generic cases that rely on inout to raw pointer conversion, such as Unicode.Encoding.CodeUnit and swift-nio's ByteBuffer API. It is safe to assume that fixed-width integers are bitwise-copyable.

As discussion in Future Directions, we plan to provide a BitwiseCopyable layout constraint to generalize the solution for generic types.

Detailed Design

Given Swift declarations:

func readBytes(_ pointer: UnsafeRawPointer) {...}
func writeBytes(_ pointer: UnsafeMutableRawPointer) {...}

The new diagnostic warns on the following implicit casts:

// Let T be a statically non-bitwise-copyable type...

var t: T = ... 
readBytes(&t)
writeBytes(&t)

let constArray: [T] = ...
readBytes(constArray)

var array: [T] = ...
readBytes(&array)
writeBytes(&array)

var string: String = ...
readBytes(&string)
writeBytes(&string)

var data: Data = ...
readBytes(&data)
writeBytes(&data)

The warning for general types takes the form:

warning: cannot implicitly convert an inout value of type 'T' to expected argument type 'UnsafeRawPointer' because 'T' may contain an object reference.

The warning for Arrays takes the form:

warning: cannot implicitly convert an inout value of type '[T]' to expected argument type 'UnsafeRawPointer' because 'T' may contain an object reference.

The warning for Strings takes the form:

warning: cannot implicitly convert an inout String to expected argument type 'UnsafeRawPointer'

Implicit casts from a FixedWidthInteger will continue to be supported without a warning:

var int: some FixedWidthInteger = ...
readBytes(&int)
writeBytes(&int)

let constIntArray: [some FixedWidthInteger] = ...
readBytes(constIntArray)

var intArray: [some FixedWidthInteger] = ...
readBytes(&intArray)
writeBytes(&intArray)

Implicit casts from trivial collection elements, will continue to be supported without a warning:

var byteArray: [UInt8] = [0]
readBytes(&byteArray)
writeBytes(&byteArray)

let string: String = sarg
readBytes(string)
readUInt8(string)

Given the C declarations:

void read_char(const char *input);
void read_uchar(const unsigned char *input);
void write_char(char *input);
void write_uchar(unsigned char *input);

Per SE-0324: Relax diagnostics for pointer arguments to C functions, all of the above implicit casts will have the same behavior after substituting read_char or read_uchar in place of readBytes after substituting write_char or write_uchar in place of writeBytes.

Implementation: PR #63825.

Source compatibility

In Swift 5, the proposal introduces a warning without breaking source.

In Swift 6, the warning may become a source-breaking error. This is conditional on whether the community is receptive to using the workarounds that we provide for the warning.

The new diagnostic will only warn on code that views the raw bytes of a class reference or a generic type. In most cases, the source code that will be diagnosed as warning was never actually supported, and in some cases was already undefined behavior. In all cases, a simple workaround is available.

Workarounds for common cases

Users can silence the warning using either withUnsafePointer or withUnsafeBytes as follows. Note that these are all extremely dangerous use cases that are not generally supported, but they will work in practice under specific conditions. Those conditions are out of scope for this proposal, but they are the same regardless of whether the code relies on implicit conversion or uses the workarounds below...

To pass the address of an internal stored property through an opaque pointer (unsupported but not uncommon):

class TestStoredProperty {
  var property: AnyObject? = nil

  func testPropertyId() {
    withUnsafePointer(to: &property) {
      take_opaque_pointer($0)
    }
  }
}

To pass a class reference through an opaque pointer:

// C decl
// void take_opaque_pointer(void *);

let object: AnyObject = ...
withExtendedLifetime (object) {
  take_opaque_pointer(Unmanaged.passUnretained(object).toOpaque());
}

To expose the bitwise representation of class references:

func readBytes(_ pointer: UnsafeRawPointer) {...}

withUnsafePointer(to: object) {
  readBytes($0.baseAddress!)
}

The diagnostic message does not mention specific workarounds, such as withUnsafeBytes(of:), because, although helpful for migration, that would push developers toward writing invalid code in the future. For example, if a user incorrectly tries to convert a user-defined collection directly to a raw pointer, a diagnostic that suggests withUnsafeBytes(of:) would encourage rewriting the code as follows:

withUnsafeBytes(of: &collection) {
  readBytes($0)
}

This actually promotes the behavior that we're trying to prevent! Quite often, the programmer instead needs to reach for a method on a collection type, such as Data.withUnsafeBytes().

Associated object String keys

Associated objects are strongly discouraged in Swift and may be deprecated. Nonetheless, legacy code can't always be redesigned at the time of a language update. We offer some quick workarounds here.

Code that attempts to take the address of a String as an associated object key will now raise a type conversion error:

import Foundation
 
class Container {
  static var key = "key"

  func getObject() -> Any? {
    return objc_getAssociatedObject(self, &Container.key)
  }
}

This can be rewritten using the direct, low-level withUnsafePointer workaround:

class Container {
  static var key = "key"

  func getObject() -> Any? {
    withUnsafePointer(to: Container.key) {
      return objc_getAssociatedObject(self, $0)
    }
  }
}

Alternatively, you can use the key's object identity rather the address of the property. But this only works with objects that require separate allocation at instantiation. Neither NSString, nor NSNumber can be safely used. NSObject is a safe bet:

class Container {
  static var key = NSObject()

  func getID(_ object: AnyObject) -> UnsafeRawPointer {
    return UnsafeRawPointer(Unmanaged.passUnretained(object).toOpaque())
  }
  func getObject() -> Any? {
    return objc_getAssociatedObject(self, getID(Container.key3))
  }
}

A safer and more principled approach would be to use a general purpose property wrapper to define unique, pointer-sized keys:

@propertyWrapper
struct UniqueAddress {
  var _placeholder: Int8 = 0
 
  var wrappedValue: UnsafeRawPointer {
    mutating get {
      // This is "ok" only as long as the wrapped property appears
      // inside of something with a stable address (a global/static
      // variable or class property) and the pointer is never read or
      // written through, only used for its unique value
      return withUnsafeBytes(of: &self) {
        return $0.baseAddress.unsafelyUnwrapped
      }
    }
  }
}

class Container {
  @UniqueAddress static var key
  
  func getObject() -> Any? {
    return objc_getAssociatedObject(self, Container.key)
  }
}

Effects on ABI stability and API resilience

None. Restricting casts has no effect on the ABI or resilience.

Alternatives considered

Narrow, type-specific diagnostics

We could add an attribute to Data and other copy-on-write containers to more selectively suppress implicit conversion. This approach would miss an opportunity to significantly improve safety of the language. As soon as BitwiseCopyable is available, we will use it to qualify other APIs that manipulate raw bytes. BitwiseCopyable is the type that we want programmers to associate with raw memory features going forward.

Forbid implicit inout conversion to raw pointers

We could forbid all implicit inout conversion to raw pointers, except for direct calls to C functions. We do have precedent for allowing certain conversions only for C interoperability: SE-0324: Relax diagnostics for pointer arguments to C functions.

The most common uses involving C functions would still work:

char c_buffer[10]; // C header
read_char(&c_buffer) // called from Swift code

This would, however, break legitimate uses of inout-to-raw-pointer conversion in Swift, forcing the use of a closure-taking API. There are already published examples of Swift-to-C interoperability that rely on this feature, similar to the example from the introduction:

var x: UInt16 = 0xaaaa
readBytes(&x)

We would need to ask users to migrate all these cases to:

var x: UInt16 = 0xaaaa
withUnsafeBytes(of: x) {
  readBytes($0)
}

While this may seem consistent with Swift's policy of explicitly requiring an "unsafe" API in cases that may lead to undefined behavior, the downsides outweigh that benefit:

  • it creates a migration barrier to Swift 6

  • it makes it impossible to write Swift shims on top of C APIs that have the same usability. This was not the case in the aforementioned SE-0324, because those Swift shims simply need to use the correct raw pointer type.

More importantly, this solution does not actually address the cases that pose the most danger. Misuses of implicit conversions usually do involve a pointer-taking C function. In the case of inout conversion, the C function may get a pointer to the data structures control fields rather then its element. And in the case of constant pointers, the C function sometimes returns or otherwise escapes the same pointer.

Forbid implicit Array and String conversion to unsafe pointers

Implicit conversion is generally dangerous because it allows interior pointers to escape, exposing undefined behavior and use-after-free security bugs. The most commonly misused cases are the special cases that were added for convenience: Arrays, and Strings. We could disable these special cases. For example, this would now be en error:

void read_char(char *input);

let string: String = ...
read_char(string)

This case is not as dangerous as the inout conversion to raw pointers case addressed by this proposal because it does not expose internal control data. This case is only problematic when the pointer-taking-function returns or otherwise escapes the transient pointer, potentially resulting in a use-after-free, which is common problem with C APIs in general.

Naturally, this would break C interoperability in some common cases, such as using an Array as a byte buffer or viewing a String's null-terminated UTF8 representation as raw bytes.

Regardless, this issue is unrelated to raw pointers. If additional Array or String conversion restrictions are worthwhile, they merit separate discussion and should be pursued in an independent proposal.

Future directions

Add a BitwiseCopyable layout constraint

Generic code may require a source compatibility workaround even for generic types that are always bitwise copyable at runtime. In the near future, we plan to provide a BitwiseCopyable layout constraint for this purpose.

The following conversions would now be valid:

func foo<T: BitwiseCopyable>(_: T.Type) {
  var t: T = ...
  readBytes(&t)

  let array: [T] = ...
  readBytes(t)

  var array: [T] = ...
  readBytes(&t)
}

Various standard API's already require bitwise copyability, but there is no way to express the requirement. Instead, we use dynamic _isPOD() assertions. For years, there has been strong consensus that this should be handled by a layout constraint. Having this generic type constraint would give users a way to work around the new conversion restrictions in generic code.

A proposal for this feature is in progress. We expect BitwiseCopyable to be available before converting this proposed pointer conversion diagnostic to an error. Until then, programmers need to use the withUnsafePointer or withUnsafeBytes workarounds.

Forbid withUnsafeBytes(of:_:) for non-bitwise-copyable or CoW types

The following use cases should be illegal similar to their inout conversion counterparts. When programmers ask for the "bytes" of a collection, they almost certainly wanted to view the elements. Rather than leak the internal representation of the collection.

var string: String = ...
withUnsafeBytes(of: &string) {...}

var array: [T] = ...
withUnsafeBytes(of: &array) {...}

var set: Set<T> = ...
withUnsafeBytes(of: &set) {...}

var data: Data = ...
withUnsafeBytes(of: &data) {...}

In the future, this can be done with an overload:

@available(swift, deprecated: 6, message: "either use withUnsafePointer(of:) to point to a value containing object references, or use a method to retrieve the contents of a container")
public func withUnsafeBytes<T: BitwiseCopyable, Result>(
  of value: inout T,
  _ body: (UnsafeRawBufferPointer) throws -> Result
) rethrows -> Result {...}

By adding a new CopyOnWriteValue marker protocol with conformances from String, Array, Set, and Data, this restriction could be narrowed to diagnose only the most confusing cases shown above without affecting all non-bitwise-copyable values.

@available(swift, deprecated: 6, message: "use a method to retrieve the contents of a container")
public func withUnsafeBytes<T: CopyOnWriteValue, Result>(
  of value: inout T,
  _ body: (UnsafeRawBufferPointer) throws -> Result
) rethrows -> Result {...}

This is also clear, complementary improvement, but it can be pursued separately as a standard library enhancement. This proposal is focussed on implicit conversions.

Acknowledgments

Thanks to Robert Widmann and Mike Ash for advice on handling associated objects.

Thanks to Joe Groff for suggesting the property wrapper workaround.

Thanks to Becca Royal-Gordon, Guillaume Lessard, and Tim Kientzle for in-depth proposal review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment