Skip to content

Instantly share code, notes, and snippets.

@mikeschinkel
Last active September 17, 2019 11:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikeschinkel/50aec7094f5643223d28674639f9f117 to your computer and use it in GitHub Desktop.
Save mikeschinkel/50aec7094f5643223d28674639f9f117 to your computer and use it in GitHub Desktop.
Proposal for Union Class Types in PHP

Proposal for Union Class Types in PHP

This is a strawman proosal as an addition to the Union Types v2 proposal from Nikita Popov.

This proposal introduces the concept of a special type of class called a union class which is created by adding a types keyword followed by a vertical bar separated list of types, just like found in Nikita's proposal.

Note: I know that mixed is not a valid typehint but I use below anyway for clarity.

Benefits

The benefits of this proposal over and above the union_types_v2 proposal include:

  1. Addresses the scenarios where type aliases in Union Types v2 would be desired but it provides a more robust and type-safe alternative to the two general ways envisioned by that proposal.

  2. Leverages exiting syntax and semantics of classes and objects requiring only a small amount of language change.

  3. Full type safety when accessing typed values passed to functions, and anywhere else union instances are used.

  4. No ambiguity of types except where specifically wanted, e.g. value() and setValue()

  5. Ability to capture into one variable the value passed from the caller and then pass to called functions with having to first dereference the typed value.

  6. Ability to create, manipulate and pass around unioned value and use in contexts not yet envisioned by this the PHP language's implementors.

Specifics

Declaring a union

A very simple Number union class would look like this:

class Number {
    types int|float;
}

Built-in methods a union declaration would imply

Any class defined with a types keyword would automatically get at least at least six (6) methods:

  • Three (3) specifically-named methods; e.g. value(), setValue() and type(), and
  • At least two (2) methods, each with the name to{Type}(), one for each unioned type.

So for the Number union the built-in methods would be:

  • public function type():string
  • public function value():mixed
  • public function setValue(mixed)
  • public function toInt():?int
  • public function toFloat():?float

If we added a string type to the union there would be a sixth method:

  • public function toString():?string

If we unioned a class Foo then there would be another method:

  • public function toFoo():Foo

If we unioned a namespaced class \Foo\Bar then there would be yet another method:

  • public function toFoo_Bar():\Foo\Bar

There would also be a static method that would return an array of the types defined in the class:

  • public static function types():?string[]

Declaring a union with methods

And a more complex union class would look like this, where this shows how a union would be used:

class Number {
    types int|float;

    public function _construct(int|float $number) {
        $this->setValue($number);
    }

    public function getInt(): int {
        switch ( $this->type() ) {
        case  'int':
            return $this->toInt();
        case  'float':
            return intval($this->toFloat());
        }
        return 0;
    }

    public function getFloat(): float {
        switch ( $this->type() ) {
        case  'int':
            return 1.0 * $this->toInt();
        case  'float':
            return $this->toFloat();
        }
        return 0.0;
    }
}

Accepting params into a union instance

Here are some examples using as an anonymous class:

function showNumber(new class{types int|float} $number) {
   echo $number->value();
}

And then a shorthand I propose which would be the equivalent of the prior example:

function showNumber2(int|float $number) {
   echo $number->value();
}

These functions would be called like so:

showNumber(123);          // Prints 123
showNumber(1.23);         // Prints 1.23
showNumber("123");        // Throws a type error

These functions would also accept a matching union type instance instead of automatically creating one when called:

$number = new Number(123);
showNumber($number);      // Prints 123

Calling the typed built-in methods

When you call the typed build-in methods you either get the expected type, or null.

echo $number->toInt();              // Prints 123
showNumber($number);                // Prints 123

echo $number->toFloat();            // Prints (null) or alternately would throw a type error
echo is_null($number->toFloat());   // Prints 1 (meaning true)

Alternately these would only return the expected type and throw an error if the wrong type is used.

Changing the internal value

To change the internal value of a union you would pass a mixed value to the instance method setValue():

echo $number->toInt();              // Prints 123
echo $number->type();               // Prints int

$number->setValue(123.45);          // Assigns 123.45 into the union's internal provide value
echo $number->toInt();              // Prints 123.45
echo $number->type();               // Prints float

Returning values

This proposal would not need any changes in the handling of return values beyond those already envisioned by Nikiti's v2 proposal. These could work as expected:

function Foo(): int|string {
    return 1;
}
$i = Foo();
echo gettype($i);     // Prints int
function Foo(): int|string {
    return "abc";
}
$s = Foo();
echo gettype($s);     // Prints string

And this would return the union class instance, as expected:

function Foo(Number $number): Number {
    return $number;
}
$n = Foo();
echo gettype($n);     // Prints Number

Declaring Properties

Properties would be definable just like in Union Types v2, or by using the union class name, such as the following:

Using Number

class Building {
    public Number $squareMeters;
    public function __construct(Number $squareMeters) {
        $this->squareMeters = $squareMeters;
    }
}

Using int|float

class Building2 {
    public int|float $squareMeters;
    public function __construct(int|float $squareMeters) {
        $this->squareMeters = $squareMeters;
    }
}

Assigning Properties

Properties when assigned one of the union types would automatically instantiate a union class type:

$building = new Building(2500);
echo gettype($building->squareMeters);    // Prints Number
echo $building->squareMeters->type();     // Prints int

$building->squareMeters = 5000.0;
echo gettype($building->squareMeters);    // Prints Number
echo $building->squareMeters->type();     // Prints float

However when instantiating an anonymously declared union class, it would behave just like an anonymous class behaves:

$b2 = new Building2(2500);
echo gettype($b2->squareMeters);    // Prints class@anonymous
echo $b2->squareMeters->type();     // Prints int

$building->squareMeters = 5000.0;
echo gettype($b2->squareMeters);    // Prints class@anonymous
echo $b2->squareMeters->type();     // Prints float

Named Union equivalence to anonymous union (optional, but ideal)

An instance of a declared named union class should be able to be passed to a function declared to accept an anonymous union that contains when the list of unioned types an equivalent, e.g.:

$building = new Building(2500);
$b2= new Building2($building);        // Accepts and creates $b2

However, the opposite should not be possible, for type safety:

$b2 = new Building2(2500);
$building= new Building($building);   // Throws a type error.

On the other hand, both of these would be valid:

$b2 = new Building2(2500);
$buildingA= new Building($building->value());  // Accepts and creates $buildingA
$buildingB= new Building($building->ToInt());  // Accepts and creates $buildingB

Child classes

When a child class is extended from a union class it is also a union class.

namespace MyApp;
class Number extends \Number {
    public string $decimal_point = '.';
    public string $thousands_sep = ',';
    private mixed $original_type;
    public int $decimal_places;
    public function __construct(int|float|string $value) {
        $this->original_type = $value->type();
        if $value->type()!=='string' {
            parent::__construct($value);
        } else if (false!==($pos=strpos($value,$this->decimal_point))) {
            $this->decimal_places = strlen($value)-$pos-1;
            parent::__construct(floatval($value));
        } else {
            parent::__construct(intval($value));
        }
    }
    public function toString():string {
        return $this->original_type==='string' {
            ? (string)$this->value()
            : null;
    }
    public function type():mixed {
        return $this->original_type!=='string' {
            ? $this->type()
            : 'string';
    }
    public function value():mixed {
        return $this->original_type!=='string'
            ? $this->value()
            : number_format($this->toString(),
                $this->decimal_places,
                $this->decimal_point,
                $this->thousands_sep
            );
    }
}

Reflection

To be fleshed out assuming the rest of this proposal gains traction.

What "magic" would PHP need to provide?

  1. Accepting parameters of one of the unioned types from the caller and transforming them to an instance of the union class within the function.

  2. Providing the type(), value() and setValue() methods as well as the ->to*() methods for the union class without requiring them to be implemented by the class designer.

  3. Automatically creating a new instance of a union class instance when

    a. A value is passed to a function where type1|type2|...|typeN is declared as a type parameter but the full anonymous class was not defined; see showNumber2() above as compared to showNumber().

    b. A value it assigned to a property that has been declared to accept a union.

  4. Provide an implied parent class so that the parent:: method that would allow extending the methods built-in by the including the types keyword.

Unaddressed Edge Cases

I am sure there are edge cases, but I wanted to get this proposal into the discussion before the train left the station.

If you find any such edge cases please comment below — possibly providing any suggestions you may have — and I will do my best to address them.

Backwards Incompatible Changes

This proposal does not contain any backwards incompatible changes as far as I am aware.

Prior Art

Some will (and should) see some similarities between this proposal and interface{} types in GoLang. However, this proposal was influenced by its authors use of Go interfaces, it is not proposing to copy Go interfaces as Go and PHP are two significantly different languages.

End of Proposal

@mindplay-dk
Copy link

Any class defined with a types keyword would automatically get at least at least six (6) methods

So the custom syntax etc. is really just an abstract, generic base-class, right?

class Number extends ValueType<int|float> {
    // ...
}

The way I see it, this feature would be a much better fit if we had type unions and generics in place first.

@mikeschinkel
Copy link
Author

@mindplay-dk

Thank you for the comment.

"s really just an abstract, generic base-class, right?"

Probably.

OTOH, my personal opinion is that generics are one of the more unfortunate features added to the languages that have it. Not because the capabilities it provides are not useful, but because its syntax is so mind-bending for me and I have never been reason about generics without a large amount of effort.

GoLang is currently debating the addition of something that provides the capabilities of generics, and after many years they finally have a draft design that most of the community seems to be attracted to. It uses contracts rather than generics, and is a much approach than adding the syntax salad that are generics.

In my personal opinion (IMPO), of course.

@mindplay-dk
Copy link

Most people find generics simple once they grasp the idea.

It's often presented using terminology (and syntax, depending on the language) that can be a little confusing.

My favorite way to conceptualize it, is just think of a generic function as a function returning a function. Think of a generic class as a function returning a class. Conceptually, that's very close to the truth, even if the reality of how it's actually implemented by the compiler/interpreter is more complicated.

In terms of syntax, just think of the angle brackets as a different type of parens - but you're really just calling a function, and it happens to accept arguments of the type "type", rather than the value-types you usually provide as arguments.

Just think of class Box<T> { ... } as declaring a "function" that takes an argument of the type "type".

So given new Box<Hat>(new Hat()), the Box<Hat> part is just a "function-call" - you're passing an argument Hat of the type "type", and the "function-calls" evaluates to another "type", and the new operator and constructor argument new Hat() is then applied to that.

If you understand functions, and can conceptualize of "types" as being just another type of value, you basically understand generics. :-)

The syntax of the draft design for Go seems to take that conceptualization quite literally:

type parameters are similar to ordinary non-type function parameters, and as such should be listed along with other parameters. However, type parameters are not the same as non-type parameters, so although they appear in the list of parameters we want to distinguish them. That leads to our next design decision: we define an additional, optional, parameter list, describing type parameters. This parameter list appears before the regular parameters. It starts with the keyword type, and lists type parameters.

If that makes more sense to you, conceptually, generics in most other languages isn't that different - just that most languages use the <> brackets to visibly distinguish type-arguments from regular arguments, both in declarations and at call sites.

Admittedly, the lack of this explanation, combined with the syntax distinction, was "mind-bending" to me for a while as well. I don't know that Go's proposal really helps things though? The use of regular parens makes both declarations and call sites look very ambiguous to me - the Type keyword just looks like a type-hint, so I find it somewhat redundant, and it doesn't help the readability much. YMMV of course :-)

@mikeschinkel
Copy link
Author

@mindplay-dk — Thank you for taking the time to explain generics in detail.

However, understanding the concept of generics has never been my problem. I understand them perfectly. I even understood that I needed them back in the late 80s when I was programming in C before I knew they existed in C++.

What my concerns are with generics is reasoning about them, in practice. I can probably explain best with an analogy to high school algebra. I can solve an equation with one or two unknowns in my head. But to solve for 3 or more unknowns requires pencil and paper, at least for me.

So when I try to reason about code that uses Java-style generics I feel like I am trying to solve an equation with 4 unknowns in my head, and so I end up having to use pencil and paper to get my head around code, and for me that makes coding a lot more tedious and much less enjoyable.

For some reason the Go-style contracts make so much more sense to me. And although I have yet to program with them I feel like they will be much easier to reason about than code where I have to mentally translate type abstraction in every expression that uses them.

Maybe it is because I find languages that use keywords — like type — rather than symbols easier to reason about, or more likely because I tend to find structural typing easier to work with than nominal typing, and Go's contracts are more like structural typing than nominal typing, although Go's contracts are in fact named, they represent a set of capabilities which is what structural typing is about.

The fact that I can think about a function accepting parameters each with a single (pseudo-)type makes it so much easier for me to reason about than having to reason about a function's logic that hinges on an abstraction.

I will readily admit that I have met many other programmers who can maintain more complexity in their head than I can. It is quite possibly they have higher IQs than me, I don't know. But what I do know is that whenever I am on control of the code I ruthlessly simplify the code so that it can be understood without maintaining a lot of details in one's head. And when I have to work with languages that use Java-style generics that infect many of the open-source libraries available, that control of being able to simplify code is taken away from me.

#fwiw

@mikeschinkel
Copy link
Author

@mindplay-dk — A quick follow up, using your example, can you give me use-cases where interfaces will not suffice instead of generics? Ideally with example code that would show the hypothetical generics:

interface Boxable{
	public function name():string;
}
class Hat implements Boxable{
	private string $name;
	public function name():string {
		return $this->name;
	}
}
function ShowBox(Boxable $box) {
	echo $box->name();	
}

@mindplay-dk
Copy link

mindplay-dk commented Sep 17, 2019

can you give me use-cases where interfaces will not suffice instead of generics?

Collections are always a good use-case, and probably one of the most-cited reasons people want generics in Go.

So, take this example in TypeScript for example:

class Collection<T> {
    private items: Array<T> = [];

    add(item: T) {
        this.items.push(item);
    }

    has(item: T) {
        return this.items.indexOf(item) !== -1;
    }

    remove(item: T) {
        const index = this.items.indexOf(item);

        this.items.slice(index, 1);
    }
}

class Hat {}

const items = new Collection<Hat>();

items.add(new Hat()); // static type-safety

This lets us write a collection-type that provides static type-checking, for any type, without Hat needing to implement a particular interface or having any awareness of collections, and without having to define a specific collection-type for hats just to get that type-safety.

Now to contrast with run-time type-safety in dynamic languages like plain JS:

function Collection(T) {
    const types = new Map();
    
    if (! types.get(T)) {
        class Collection {
            constructor() {
                this.items = [];
            }

            add(item) {
                // run-time type-checking:
                if (! (item instanceof T)) {
                    throw Error("unexpected item type: ", item);
                }

                this.items.push(item);
            }

            has(item) {
                return this.items.indexOf(item) !== -1;
            }

            remove(item) {
                const index = this.items.indexOf(item);

                this.items.slice(index, 1);
            }
        }

		    types.set(T, Collection);
    }
    
    return types.get(T);
}

class Hat {}

const items = new (Collection(Hat))();
//                            |    ^^ constructor arguments
//                            ^ "type argument"

items.add(new Hat()); // run-time type-safety

As you can see, the same thing is possible at run-time - although this has performance implications, some langs literally implement run-time generic type-checks in a similar way; most probably don't, but, at least conceptually, the idea is very similar and you don't need to worry too much about how it's implemented in practice.

As far as the syntax difference, well, in the first example, the type argument refers to Hat the type - something that exists only in the compiler, at run-time. As you might note, this works for types that don't have a class constructor associated with them, e.g. type expressions or interface declarations in TS, or imported types with a class constructor that isn't loaded or reachable at the time, and so forth. The syntax with angle bracket for those parameters lets you know that these are parameters to a "function" that generates a parameterized class or function at compile time.

Maybe it is because I find languages that use keywords — like type — rather than symbols easier to reason about, or more likely because I tend to find structural typing easier to work with than nominal typing, and Go's contracts are more like structural typing than nominal typing, although Go's contracts are in fact named, they represent a set of capabilities which is what structural typing is about.

As said, YMMV 🙂

To me, the use of a seemingly type-hint for a seemingly type named Type is rather confusing - referencing a type and generating a parameterized class/function is a compile-time operation, so the syntax distinction in other langs makes a lot of sense to me.

Again though, YMMV 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment