Skip to content

Instantly share code, notes, and snippets.

@mikeschinkel
Last active September 17, 2019 11:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikeschinkel/50aec7094f5643223d28674639f9f117 to your computer and use it in GitHub Desktop.
Save mikeschinkel/50aec7094f5643223d28674639f9f117 to your computer and use it in GitHub Desktop.
Proposal for Union Class Types in PHP

Proposal for Union Class Types in PHP

This is a strawman proosal as an addition to the Union Types v2 proposal from Nikita Popov.

This proposal introduces the concept of a special type of class called a union class which is created by adding a types keyword followed by a vertical bar separated list of types, just like found in Nikita's proposal.

Note: I know that mixed is not a valid typehint but I use below anyway for clarity.

Benefits

The benefits of this proposal over and above the union_types_v2 proposal include:

  1. Addresses the scenarios where type aliases in Union Types v2 would be desired but it provides a more robust and type-safe alternative to the two general ways envisioned by that proposal.

  2. Leverages exiting syntax and semantics of classes and objects requiring only a small amount of language change.

  3. Full type safety when accessing typed values passed to functions, and anywhere else union instances are used.

  4. No ambiguity of types except where specifically wanted, e.g. value() and setValue()

  5. Ability to capture into one variable the value passed from the caller and then pass to called functions with having to first dereference the typed value.

  6. Ability to create, manipulate and pass around unioned value and use in contexts not yet envisioned by this the PHP language's implementors.

Specifics

Declaring a union

A very simple Number union class would look like this:

class Number {
    types int|float;
}

Built-in methods a union declaration would imply

Any class defined with a types keyword would automatically get at least at least six (6) methods:

  • Three (3) specifically-named methods; e.g. value(), setValue() and type(), and
  • At least two (2) methods, each with the name to{Type}(), one for each unioned type.

So for the Number union the built-in methods would be:

  • public function type():string
  • public function value():mixed
  • public function setValue(mixed)
  • public function toInt():?int
  • public function toFloat():?float

If we added a string type to the union there would be a sixth method:

  • public function toString():?string

If we unioned a class Foo then there would be another method:

  • public function toFoo():Foo

If we unioned a namespaced class \Foo\Bar then there would be yet another method:

  • public function toFoo_Bar():\Foo\Bar

There would also be a static method that would return an array of the types defined in the class:

  • public static function types():?string[]

Declaring a union with methods

And a more complex union class would look like this, where this shows how a union would be used:

class Number {
    types int|float;

    public function _construct(int|float $number) {
        $this->setValue($number);
    }

    public function getInt(): int {
        switch ( $this->type() ) {
        case  'int':
            return $this->toInt();
        case  'float':
            return intval($this->toFloat());
        }
        return 0;
    }

    public function getFloat(): float {
        switch ( $this->type() ) {
        case  'int':
            return 1.0 * $this->toInt();
        case  'float':
            return $this->toFloat();
        }
        return 0.0;
    }
}

Accepting params into a union instance

Here are some examples using as an anonymous class:

function showNumber(new class{types int|float} $number) {
   echo $number->value();
}

And then a shorthand I propose which would be the equivalent of the prior example:

function showNumber2(int|float $number) {
   echo $number->value();
}

These functions would be called like so:

showNumber(123);          // Prints 123
showNumber(1.23);         // Prints 1.23
showNumber("123");        // Throws a type error

These functions would also accept a matching union type instance instead of automatically creating one when called:

$number = new Number(123);
showNumber($number);      // Prints 123

Calling the typed built-in methods

When you call the typed build-in methods you either get the expected type, or null.

echo $number->toInt();              // Prints 123
showNumber($number);                // Prints 123

echo $number->toFloat();            // Prints (null) or alternately would throw a type error
echo is_null($number->toFloat());   // Prints 1 (meaning true)

Alternately these would only return the expected type and throw an error if the wrong type is used.

Changing the internal value

To change the internal value of a union you would pass a mixed value to the instance method setValue():

echo $number->toInt();              // Prints 123
echo $number->type();               // Prints int

$number->setValue(123.45);          // Assigns 123.45 into the union's internal provide value
echo $number->toInt();              // Prints 123.45
echo $number->type();               // Prints float

Returning values

This proposal would not need any changes in the handling of return values beyond those already envisioned by Nikiti's v2 proposal. These could work as expected:

function Foo(): int|string {
    return 1;
}
$i = Foo();
echo gettype($i);     // Prints int
function Foo(): int|string {
    return "abc";
}
$s = Foo();
echo gettype($s);     // Prints string

And this would return the union class instance, as expected:

function Foo(Number $number): Number {
    return $number;
}
$n = Foo();
echo gettype($n);     // Prints Number

Declaring Properties

Properties would be definable just like in Union Types v2, or by using the union class name, such as the following:

Using Number

class Building {
    public Number $squareMeters;
    public function __construct(Number $squareMeters) {
        $this->squareMeters = $squareMeters;
    }
}

Using int|float

class Building2 {
    public int|float $squareMeters;
    public function __construct(int|float $squareMeters) {
        $this->squareMeters = $squareMeters;
    }
}

Assigning Properties

Properties when assigned one of the union types would automatically instantiate a union class type:

$building = new Building(2500);
echo gettype($building->squareMeters);    // Prints Number
echo $building->squareMeters->type();     // Prints int

$building->squareMeters = 5000.0;
echo gettype($building->squareMeters);    // Prints Number
echo $building->squareMeters->type();     // Prints float

However when instantiating an anonymously declared union class, it would behave just like an anonymous class behaves:

$b2 = new Building2(2500);
echo gettype($b2->squareMeters);    // Prints class@anonymous
echo $b2->squareMeters->type();     // Prints int

$building->squareMeters = 5000.0;
echo gettype($b2->squareMeters);    // Prints class@anonymous
echo $b2->squareMeters->type();     // Prints float

Named Union equivalence to anonymous union (optional, but ideal)

An instance of a declared named union class should be able to be passed to a function declared to accept an anonymous union that contains when the list of unioned types an equivalent, e.g.:

$building = new Building(2500);
$b2= new Building2($building);        // Accepts and creates $b2

However, the opposite should not be possible, for type safety:

$b2 = new Building2(2500);
$building= new Building($building);   // Throws a type error.

On the other hand, both of these would be valid:

$b2 = new Building2(2500);
$buildingA= new Building($building->value());  // Accepts and creates $buildingA
$buildingB= new Building($building->ToInt());  // Accepts and creates $buildingB

Child classes

When a child class is extended from a union class it is also a union class.

namespace MyApp;
class Number extends \Number {
    public string $decimal_point = '.';
    public string $thousands_sep = ',';
    private mixed $original_type;
    public int $decimal_places;
    public function __construct(int|float|string $value) {
        $this->original_type = $value->type();
        if $value->type()!=='string' {
            parent::__construct($value);
        } else if (false!==($pos=strpos($value,$this->decimal_point))) {
            $this->decimal_places = strlen($value)-$pos-1;
            parent::__construct(floatval($value));
        } else {
            parent::__construct(intval($value));
        }
    }
    public function toString():string {
        return $this->original_type==='string' {
            ? (string)$this->value()
            : null;
    }
    public function type():mixed {
        return $this->original_type!=='string' {
            ? $this->type()
            : 'string';
    }
    public function value():mixed {
        return $this->original_type!=='string'
            ? $this->value()
            : number_format($this->toString(),
                $this->decimal_places,
                $this->decimal_point,
                $this->thousands_sep
            );
    }
}

Reflection

To be fleshed out assuming the rest of this proposal gains traction.

What "magic" would PHP need to provide?

  1. Accepting parameters of one of the unioned types from the caller and transforming them to an instance of the union class within the function.

  2. Providing the type(), value() and setValue() methods as well as the ->to*() methods for the union class without requiring them to be implemented by the class designer.

  3. Automatically creating a new instance of a union class instance when

    a. A value is passed to a function where type1|type2|...|typeN is declared as a type parameter but the full anonymous class was not defined; see showNumber2() above as compared to showNumber().

    b. A value it assigned to a property that has been declared to accept a union.

  4. Provide an implied parent class so that the parent:: method that would allow extending the methods built-in by the including the types keyword.

Unaddressed Edge Cases

I am sure there are edge cases, but I wanted to get this proposal into the discussion before the train left the station.

If you find any such edge cases please comment below — possibly providing any suggestions you may have — and I will do my best to address them.

Backwards Incompatible Changes

This proposal does not contain any backwards incompatible changes as far as I am aware.

Prior Art

Some will (and should) see some similarities between this proposal and interface{} types in GoLang. However, this proposal was influenced by its authors use of Go interfaces, it is not proposing to copy Go interfaces as Go and PHP are two significantly different languages.

End of Proposal

@mindplay-dk
Copy link

mindplay-dk commented Sep 17, 2019

can you give me use-cases where interfaces will not suffice instead of generics?

Collections are always a good use-case, and probably one of the most-cited reasons people want generics in Go.

So, take this example in TypeScript for example:

class Collection<T> {
    private items: Array<T> = [];

    add(item: T) {
        this.items.push(item);
    }

    has(item: T) {
        return this.items.indexOf(item) !== -1;
    }

    remove(item: T) {
        const index = this.items.indexOf(item);

        this.items.slice(index, 1);
    }
}

class Hat {}

const items = new Collection<Hat>();

items.add(new Hat()); // static type-safety

This lets us write a collection-type that provides static type-checking, for any type, without Hat needing to implement a particular interface or having any awareness of collections, and without having to define a specific collection-type for hats just to get that type-safety.

Now to contrast with run-time type-safety in dynamic languages like plain JS:

function Collection(T) {
    const types = new Map();
    
    if (! types.get(T)) {
        class Collection {
            constructor() {
                this.items = [];
            }

            add(item) {
                // run-time type-checking:
                if (! (item instanceof T)) {
                    throw Error("unexpected item type: ", item);
                }

                this.items.push(item);
            }

            has(item) {
                return this.items.indexOf(item) !== -1;
            }

            remove(item) {
                const index = this.items.indexOf(item);

                this.items.slice(index, 1);
            }
        }

		    types.set(T, Collection);
    }
    
    return types.get(T);
}

class Hat {}

const items = new (Collection(Hat))();
//                            |    ^^ constructor arguments
//                            ^ "type argument"

items.add(new Hat()); // run-time type-safety

As you can see, the same thing is possible at run-time - although this has performance implications, some langs literally implement run-time generic type-checks in a similar way; most probably don't, but, at least conceptually, the idea is very similar and you don't need to worry too much about how it's implemented in practice.

As far as the syntax difference, well, in the first example, the type argument refers to Hat the type - something that exists only in the compiler, at run-time. As you might note, this works for types that don't have a class constructor associated with them, e.g. type expressions or interface declarations in TS, or imported types with a class constructor that isn't loaded or reachable at the time, and so forth. The syntax with angle bracket for those parameters lets you know that these are parameters to a "function" that generates a parameterized class or function at compile time.

Maybe it is because I find languages that use keywords — like type — rather than symbols easier to reason about, or more likely because I tend to find structural typing easier to work with than nominal typing, and Go's contracts are more like structural typing than nominal typing, although Go's contracts are in fact named, they represent a set of capabilities which is what structural typing is about.

As said, YMMV 🙂

To me, the use of a seemingly type-hint for a seemingly type named Type is rather confusing - referencing a type and generating a parameterized class/function is a compile-time operation, so the syntax distinction in other langs makes a lot of sense to me.

Again though, YMMV 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment