Skip to content

Instantly share code, notes, and snippets.

@Danack
Last active February 12, 2020 16:41
Show Gist options
  • Save Danack/471036a5f03bc3c64c6027c207923e33 to your computer and use it in GitHub Desktop.
Save Danack/471036a5f03bc3c64c6027c207923e33 to your computer and use it in GitHub Desktop.
Extending scalar types

Introduction

Type systems make it easier to reason about software.

They limit the number of possible things that can happen in the code by restricting what types of values can be passed to or returned from functions.

As well as making it easier for automated tools (e.g. editors like an IDE, or static analysis tool like Psalm, PhpStan etc.) to detect errors in the code, this makes it easier for humans to understand what the code is doing.

PHP 7.0 added scalar types to the language which are useful, but due to not being able to extend those types, can still lead to situations where there are bugs in code.

For example if you have a function that takes two strings, where those strings represent different types, it's possible to put the parameters in the incorrect order.

function sendEmail(string $email_address, string $name) {...}

class User {
    public function getName(): string { ... }

    public function getEmailAddress(): string { ... }
}


// Wrong way round
sendEmail($user->getName(), $user->getEmailAddress());

// Correct order.
sendEmail($user->getEmailAddress(), $user->getName());

Because the type for both parameters is 'string', it's not obviously a mistake to either humans or computers that there is an error here.

Although it's possible to define a 'strong' type for scalars, it's currently both an annoyingly verbose task to do but also makes it difficult for libraries, and code using those libraries to co-operate.

Proposal

This RFC proposes adding the ability to 'extend' scalar types, so that more specific types can be used easily.

This is done through the introduction of an interface that defines a magic method '__value()'

interface ScalarType {
	public function __value(): int|float|string|bool;
}

Because PHP now supports contravariance, an implementing class does not need to have a return type declaration of 'int|float|string|bool', instead it can declare a more limited return type. For example:

class FirstName implements ScalarType
{
    private string $value;

    public function __construct(string $value) {
        $this->value = $value;
    }

    public function __value(): string {
        return $this->value;
    }
}

A variable that implements the ScalarType interface will be coerced to a scalar value under the following circumstances.

When the variable is used in an operation

$firstName = new FirstName();

echo "Hello " . $firstName; 
// output is "Hello John";

When a cast operation is performed on the variable.

$firstName = new FirstName(); $initialFirstName = (string)$firstName; var_dump($initialFirstName); // string("John")

When a scalar type is used as a parameter to a function where the type checking has failed.

Example 1 - simple conversion.

<?php

declare(strict_types = 1);

function foo(string $value) {
    var_dump($value);
};

$firstName = new FirstName();
foo($firstName);
// string(4) "John"

Because the function only takes a string value, the FirstName is coerced to a scalar first by calling the $firstName->__value() method.

Example 2 - no conversion.

<?php

declare(strict_types = 1);

function foo(FirstName|string $value) {
    var_dump($value);
};


$firstName = new FirstName();
foo($firstName);

// var_dump output is:
// class FirstName#1 (1) {
//     private $value =>
//     string(4) "John"
// }

Because the type is acceptable as a FirstName without coercion, no coercion occurs.

Example 3 - bad conversion.

<?php

declare(strict_types = 1);

function bar(int $value) {...};

$firstName = new FirstName();
foo($firstName);

// Uncaught TypeError: Argument 1 passed to bar() must be of the type integer, FirstName given

The implementation of this RFC will need to ensure that the original type is included in the error message, rather than the type returned by __value().

When a scalar type is returned from a function where the type checking has failed.

function foo() : string {
    $firstName = new FirstName();
    return $firstName;
}

For all of the above places where a ScalarType is coerced to a scalar value the PHP engine will call the __value() method to get the value. This RFC does not propose changing the normal^H^H^H^H^H^H current type coercion/juggling rules.

Additionally, to avoid every project in the world having to define implementing strings, ints, floats, and boolean types, this RFC proposes adding these classes to PHP core.

class StringType implements ScalarType {
	protected string $value;
	
	public function __value(): string {
		return $this->value;
	}
}

class IntType implements ScalarType {
	protected int $value;
	
	public function __value(): int {
		return $this->value;
	}
}

class FloatType implements ScalarType {
	protected float $value;
	
	public function __value(): int {
		return $this->value;
	}
}

class BoolType implements ScalarType {
	protected bool $value;
	
	public function __value(): bool {
		return $this->value;
	}
}

Examples of use

Stringy example

function getMessage(Name $firstName, EmailAddress $email) {
	return "This email was sent to " . $firstName . " at " . $email;
}

Int-ish example.

class UserAge extends IntType {}
class MinimumAge extends IntType {}

function isOldEnoughToPurchaseItem(UserAge $age, Item $item)
{	
	if ($age >= $item->getMinimumAgeForPurchase()) {
		return true;
	}

	return false;
}

Side benefit - obviates the need for references.

People have legitimate uses for references for things like


function processItems($items, &$total) {
    foreach ($items as $item) {
        $total += $item->quantity();
    }  
} 

$total = 0;
processItems($items, $total);
printf("There are %d items", $total);

However, using references has a couple of severe downsides:

The type of references can be changed.

function foo(int &$bar) {

    if (rand(0, 100) === 0) {
        $bar = 'one';
    }
}

$total = 0;
foo($total);

// what type does $total have?

the raw value is exposed, which means that there is no limit on how the value can be changed.

This RFC gives a type safe way to pass an integer around an application, without as much boiler-plate as current syntax for classes.

class RunningTotal extends IntType
{
    public function __construct() {
        $this->value = 0;
    }

    public function add(int $number)
    {
        $this->value += $number;
    }
}


function processItems($items, RunningTotal $total)
{
    foreach ($items as $item) {
        $total->add($item->quantity());
    }  
} 

$total = new RunningTotal;
processItems($items, $total);
printf("There are %d items", $total);

Questions to be answered

Will this be future proof?

For example, if we decided to add a Decimal (for arbitrary-precision decimal point arithmetic) type would we be able to change the ScalarType defintion to be

interface ScalarType {
	public function __value(): int|float|string|bool|Decimal;
}

without that change causing (large) BC problems.

What about complex type checking?

Something like this would be okay:

$firstname = new FirstName();
function foo(string|int $x) {...}
foo($firstname)

But are there going to be problems around more ambiguous types....

Should there be a number type?

Should we add number type to PHP core that can represent either floats or ints? e.g. something like:

class NumberType implements ScalarType {
	protected float|int $value;
	
	public function __value(): float|int {
		return $this->value;
	}
}

FAQ

Why a magic method?

Because it's magic. This interface binds two completely different types of code; class based methods, and operators that work directly on values. It's appropriate to use a 'magic method' as this is the way that PHP indicates that a particular method is used in a special way.

Why not also arrays?

The position of this RFC is that there is likely to be changes to arrays in PHP. Either through the a successful generics implementation, or through a refactoring of array to be easier to use.

Because of that, it would be inappropriate for RFCs to add more functionality to arrays right now.

Why a single method, instead of __toString, __toInt etc?

The main reason is usability. With a single method you can write code like this:

class NumberType implements ScalarType
{
    private int|float $value;

    public function __construct(float|int $value):  {
        $this->value = $value;
    }

    public function __value(): float|int {
        return $this->value;
    }
}


function checkValueFromCallbackIsLessThanLimit($callback, int $limit) {

    $value = $callback();

    if ($number < $limit) {
        return true;
    }
    
    return false;
}


$number = new NumberType(3);
$callback = Closure::fromCallable([$number, '__toValue']);
checkValueFromCallbackIsLessThanLimit($callback, 10);

Using separate methods to get the value would require some extra inspection methods to know which would be the appropriate method to call. I can't see how the code above could be written without either extra boilerplate, or extra magic.

There are cases where a single value can be represented in different ways, for example IPv4 addresses, which can be represented either as a string or a integer.

class Ip4Address implements ScalarType
{
    private int|string $value;
    
    public function __value(): int|string {
        return $this->value;
    }

    public static function fromLong(int $longIpAddess)
    {
        // TODO - check ip value is valid
        $this->value = $longIpAddess;
    }

    public static function fromString(string $ipAddress)
    {
        // TODO - check ip value is valid %d.%d.%d.%d
        $this->value = $ipAddress;
    } 
} 

The position of this RFC is that if you want to have a single type be representable as multiple scalar types, then that is fine.

Is this proposal compatible with the 'Covariant Returns and Contravariant Parameters' RFC?

Yes. https://wiki.php.net/rfc/covariant-returns-and-contravariant-parameters

Examples of why current userland strong scalar types are not so good.

Having to wrap all the things.

class UserAge {
	private int $value;
	
	public function __construct(int $value) {
	    $this->value = $value;
	}
	
	function value(): int {
		return $this->value;
	}
}

function isOldEnoughToPurchaseItem(UserAge $age, Item $item)
{	
	if ($age->value() >= $item->getMinimumAgeForPurchase()) {
		return true;
	}

	return false;
}

vs

class UserAge extends IntType {}

function isOldEnoughToPurchaseItem(UserAge $age, Item $item)
{
	if ($age >= $item->getMinimumAgeForPurchase()) {
		return true;
	}

	return false;
}

Everything is a trade-off, and making code be type safe with less actual keypresses is a worthwhile tradeoff.

@Danack
Copy link
Author

Danack commented Feb 10, 2020

@BogdanUngureanu - yeah you're right.

That won't work at all. I'll remove it and see if the idea still makes sense after that.

@ircmaxell
Copy link

$x = foo(new FirstName());

// $x has type FirstName, not string.

I would strongly advise against this behavior. First, the function was typed against a by-value primitive, and you passed in a by-reference object. This can have non-obvious side-effects, especially if a reference to that primitive is held on to (say if it was using memoization).

Second, it can change the semantics of code, as theoretically the value returned can change over time:

function foo(int $x) {
    $y = $x + 1;
    $z = $x + 1;
    return $y === $z;
}

That could return false if the object passed in for $x had an incrementer in the __value() method.

Finally, it introduces a significant inconsistency with the === operator. For example:

$x = new class extends IntType { protected int $value = 10; };
$y = new class extends IntType { protected int $value = 10; };
foo($x, $y, 10);
function foo(int $x, int $y, int $z) {
    if ($x === $y) throw new Exception("But they are different object instances???");
    if ($x !== $y) throw new Exception("But they both resolve to the same int value");
    if ($x === $z) throw new Exception("But one's an object, and one's an int");
}

Instead, I'd suggest resolving to the value at call time, so that the type is correct inside of the called function...

@Danack
Copy link
Author

Danack commented Feb 10, 2020

@ircmaxell - yep agree. Bogdan also pointed out how bogus that would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment