Skip to content

Instantly share code, notes, and snippets.

@hpux735
Last active March 4, 2016 02:46
Show Gist options
  • Save hpux735/eafad78108ed42879690 to your computer and use it in GitHub Desktop.
Save hpux735/eafad78108ed42879690 to your computer and use it in GitHub Desktop.
Draft Swift Evolution proposal

Improve the portability of Swift with differently signed char.

Introduction

In C, the signness of char is undefined. A convention is set by either the platform, such as Windows, or by the architecture ABI specification, as is typical on System-V derived systems. A subset of known platforms germane to this discussion and their char signness is provided below.

char ARM mips PPC PPC64 i386 x86_64
Linux/ELF unsigned 1 unsigned 2 unsigned 3 unsigned 4 signed 5 signed 6
Mach-O signed [7] N/A signed [7] signed [7] signed [7] signed [7]
Windows signed [8] signed [8] signed [8] signed [8] signed [8] signed [8]

This is not a great problem in C, and indeed many aren't even aware of the issue. Part of the reason for this is that C will silently cast many types into other similar types as necessary. Notably, even with -Wall clang produces no warnings while casting beteen any pair of char, unsigned char, signed char and int. Swift, in contrast, does not cast types without explicit direction from the programmer. As implemented, char is interpreted by swift as Int8, regardless of whether the underlying platform uses signed or unsigned char. As every Apple platform (seemingly) uses signed char as a convention, it was an appropriate choice. However, now that Swift is being ported to more and more platforms, it is important that we decide how to handle the alternate case.

The problem at hand may be most simply demonstrated by a small example. Consider a C API where a set of functions return values as char:

char charNegFunction(void)    { return  -1; }
char charBigPosFunction(void) { return 255; }
char charPosFunction(void)    { return   1; }

Then, if the API is used in C thusly:

char negValue = charNegFunction();
char posValue = charPosFunction();
char bigValue = charBigPosFunction();
printf("From clang: Negative value: %d, positive value: %d, big positive value: %d\n", negValue, posValue, bigValue);

You get exactly what you would expect on signed char platforms:

From clang: Negative value: -1, positive value: 1, big positive value: -1

and on unsigned char platforms:

From clang: Negative value: 255, positive value: 1, big positive value: 255

In its current state, swift behaves similarly to C on signed char platforms.

From Swift: Negative value: -1, positive value: 1, big positive value: -1

This code is available here, if you would like to play with it yourself.

Motivation

The third stated focus area for Swift 3.0 is portability, to quote the evolution document:

  • Portability: Make Swift available on other platforms and ensure that one can write portable Swift code that works properly on all of those platforms.

As it stands, Swift's indifference to the signness of char while importing from C can be ignored in many cases. The consequences of inaction, however, leave the door open for extremely subtle and dificult to diagnose bugs any time a C API relies on the use of values greater than 128 on platforms with unsigned char; in this case the current import model certainly violates the Principle of Least Astonishment.

This is not an abstract problem that I want to have solved "just because." This issue has been a recurrent theme, and has come up several times during code review. I’ve included a sampling of these to provide some context to the discussion:

In these discussions we obviously struggle to adequately solve the issues at hand without introducing the changes proposed here. Indeed, this proposal was suggested in Swift Foundation PR-265 by Joe Groff.

These changes should happen during a major release. Considering them for Swift 3 will enable us to move forward efficiently while constraining any source incompatibilities to transitions where users expect them. Code that works properly on each of these platforms is already likely to work properly. Further, the implementation of this proposal will identify cases where a problem exists and the symptoms have not yet been identified.

Related bugs

Thanks to Ben Rimmington for identifying related bugs

Notable comment:

  • Chris Lattner: "I don't have a strong opinion on it, I was sort of hoping that all the C* types would go away someday. I agree that in this case, CChar does serve a useful purpose though."

Proposed solution

A new type for CChar will be defined. All char types from C will be mapped into this type. The CChar type will be mostly opaque; only a small set of operations will be allowed to act upon it: user will very few choices other than to make an educated decision about what arethmetic type to cast it into.

Detailed design

In general, it will not be possible to perform arithmatic operations on the new CChar. It will be easy, however to cast CChar into UInt8 and Int8. This cast will be implemented by adding init methods to Int8, UInt8, and CChar:

extension Int8 {
  init(_ rawByte: RawByte) {
    self = unsafeBitCast(rawByte, Int8.self)
  }
}

extension UInt8 {
  init(_ rawByte: RawByte) {
    self = unsafeBitCast(rawByte, UInt8.self)
  }
}

extension CChar {
  init(_ intVal: Int8) {
    _inaccessible = unsafeBitCast(intVal, UInt8.self)
  }

  init(_ uintVal: UInt8) {
    _inaccessible = uintVal
  }
}

I do think that there are a few arethmetic and comparison operators that may be considered for use with CChar:

  • Equivalence ==
  • Test/set individual bits i.e.:
    • CChar.setBit(bit: int, to: Bool)
    • CChar.testBit(bit: Int) -> Bool
  • Bit-shift operators << >> (with either or both zero fill or one fill)

The reasoning for the inclusion of these few operators is because char is often used as a bitfield. In this case, it does not make sense to think of it as being truely signed or unsigned. A small number of operators would allow the continued use of this type in this particular case without inappropriately forcing it to take on a numeric meaning. I'm not wedded to these operators, and the concept in general, but I'd like it to be a part of the discussion.

Impact on existing code

Though the change itself is relatively minor, the impact on other parts of the project including stdlib and foundation cannot be be ignored. Every example of with C char in the standard library that I encountered can immediately cast to UInt8 without consequence as the vast majority of tests pass on Int8 and UInt8 platforms. As users encounter issues with other libraries and APIs that are less portable across systems with varying char signness, other choices will have to be made. This proposal will, however, provide the tools to perform this work.

Alternatives considered

  • Another solution (and the one I originally proposed) is that CChar be aliased to UInt8 on targets where char is unsigned, and Int8 on platforms where char is signed.

  • Status quo. Currently, Swift treats all unspecified chars as signed. This mostly works most of the time, but I think we can do better.

Footnotes

[7]: proof by construction (is it signed by convention?)

$ cat test.c
char _char(char a) { return a; }
signed char _schar(signed char a) { return a; }
unsigned char _uchar(unsigned char a) { return a; }

$ clang -S -emit-llvm -target <arch>-unknown-{windows,darwin}

and look for “signext” OR “zeroext" in @_char definition

[8]: Windows char is signed by convention.

@modocache
Copy link

Awesome, thanks for working on this! Overall it looks great! I also had a few notes:

  • "swift returns behaves only" is a typo. I'd write "Swift behaves only".
  • Is there really no better way to determine which platforms have unsigned chars? The proposed #if os(OSX) || os(iOS) || os(windows) || arch(i383) || arch(x86_64) seems a little inelegant.
  • Forgive my ignorance: I seem to recall many C APIs related to strings having UnsafeMutablePointer<UInt8> in their function signatures after being transformed to Swift. Does this proposal affect this at all?

@hpux735
Copy link
Author

hpux735 commented Mar 2, 2016

Thanks for the notes @modocache!

I dont' know of a way to streamline that test, but I'm all-ears! This is essentially a karnaugh map of the table above, and I think it's the minimal solution. And, I realize now that I'm missing os(tvOS) || os(watchOS). I'm really looking forward to having more expressive directives.

The Strings API is not a consistent as I thought it would be. Regardless of system char, UTF8 uses UInt8. This is set by the implementation of unicode, and within stdlib there was an existing cast from Int8 to UInt8. That cast still exists, though it's now a no-op on systems with unsigned char.

However, there were some occasions where string processing was performed with Int8. For example, let lhsPtr = UnsafePointer<Int8>(_core.startASCII) is a common pattern I encountered. (The startASCII value is an instance of the UInt8 type because it is UTF8.) I suspect that some of this could be streamlined in the future by aligning the types in a handful of other functions. For the purposes of exploration, I replaced the instances of Int8 with CChar, which either performs a cast similar to the original implementation or is a no-op.

If the existing signatures use UnsafeMutablePointer<UInt8> there will should be no change. The only cases where I had to change anything were where Int8 was used in cases where API came from (or was polluted by) C char.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment