Skip to content

Instantly share code, notes, and snippets.

@bwoebi
Last active August 29, 2015 14:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bwoebi/b4c5564388ecd004ba96 to your computer and use it in GitHub Desktop.
Save bwoebi/b4c5564388ecd004ba96 to your computer and use it in GitHub Desktop.
Do NOT work around weak typehints, but fix them

To start, the current version (0.2) of the scalar type hints RFC is the correct way to do it, if we really want to have two modes (whether declare(strict_types=1) or <?php strict doesn't matter a lot here).

Type hinting in general is useful, no question. We can guarantee in the function body what we types are working with. That avoids bad surprises like working with a non-numeric string when we expect an integer. Very nice.

But two modes? Why? What issue are we solving with strict typing?

Strict typing

Only equal parameters can be passed in. There also a "numeric" type hint was proposed, to solve the issue where we just want a number.

That allows to check that we also only pass in what we expect. Should prevent passing absolutely wrong values in. So, for example

function foo(numeric $bar) {
	return sin($bar) * cos($bar);
}

foo("5 fruits");

That should fail.

Or that great example from Daniel Lowrey:

// Third paramter, expects integer:
// 1: only verify the peer cert HAS a name
// 2: verify the name matches the domain you connected to
// true: auto-casts to 1
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, true)

with current weak hints, that's absolutely dangerous... And found by strict hints.

But at the same time, it has its downsides:

function foo(int $bar) { /* ... */ }
foo((int) $stringContainingInteger);

We need some integer for the function (example: bitmask) when we get it in string form (e.g. database value). So, we need to cast it to get an integer. It just works. It never can fail, no matter what string we pass. It actively hides potential bugs by the string being casted. You may pass in garbage, you still get an integer.

Solution?

Improved weak type casting. Also especially fix internal function casting.

Yes, it will involve a BC break in case of internal functions. I know, we should avoid them, but rather fix it once than never.

Only pass things you can directly convert. (Actually assuming bigints will come, else we make our life harder than needed.) … and which make also sense to pass. Just like it makes no sense to pass true or false to a parameter expecting integer.

When expecting integer, only accept bare numbers. Means reject everything except -?[0-9]+(\.[0-9]*)(e[0-9]*)? (and make attention to not have more digits behind the dot than indicated after the exponent - we want an integer). And if a float is passed, fail if it doesn't represent an exact number.

When expecting a float, accept every integer and numeric string (as long as there's nothing else in the string, but the number. "5 apples" should be disallowed.)

When having a numeric type, use the same rules as for float, but if possible, return an integer rather than a float.

When expecting a boolean, accept an integer (0 == false, everything else is true) or a boolean. (Integers should be accepted, isn't it handy to just pass in a literal 1 instead of true?)

When expecting an array, only accept array.

When expecting a string, accept ints, floats and strings. (booleans and arrays don't really produce useful values here)

When being passed null, just cast null to the specific type.

This simple set of rules is relatively basic, yet it covers nearly every need of implicit casting. It fixes everything why we would need strict types. It fits all the examples above. Even when you want to pass a string to an integer type hint, it just allows valid integers, not every garbage, like when you'd have to do an explicit cast due to strict hints. Also, it probably will break nearly no PHP code which is not already broken in some way.

Is there any reason why we'd need a stricter set than what I just proposed here?

Conclusion

Fix weak type casts, add weak scalar type hints based on them. And we don't need strict type hints at all.

That's mainly why I voted no on the RFC. Not because it is a bad RFC nor a bad concept, it just tries to fix the problem the wrong way. It is a workaround around the too relax current type casting, which basically is the root problem to should be solved.

I think, working around the core problem is definitely not the way to go. Fix the core problem, then we don't need strict scalar hints (as weak hints already will cover everything strict hints aim to solve) and unicorns stop dying so fast.

The proposed rules, resumed

  • Strings which aren't numbers shouldn't be accepted for floats.
  • Strings which don't exactly represent an integer (either pure digits with eventual sign or resulting expression which an exponent shouldn't have any decimal part after expanding) shouldn't be accepted for integer.
  • Only accept floats for ints if they're exactly representable as int (means you should do the rounding yourself, relying on truncation often is risky).
  • For floats accept strings if they're representing a number or an integer. (like java, widening)
  • For numeric accept everything what float accepts, but if losslessly possible cast to int.
  • For strings accept float and integer.
  • For bool accept integer and float.
  • So what breaks compared to now:
  • Strings which aren't strictly numeric are rejected to int, float and numeric.
  • Floats, if they don't exactly represent an integer are rejected to int. (or rather: reject if there exists a nearer representation to the integer)
  • Booleans are rejected to everything else.
@hikari-no-yume
Copy link

There are real-world use cases for both strict and weak scalar types. If you tighten weak scalar types, you break existing weakly-typed code, and you don't do anything for people for whom the type is important, like anyone dealing with JSON data or the non-weakly-typed outside world - no matter how much you tighten weak hints, they still aren't as useful as strict hints for error checking. The old Scalar Type Hints with Casts RFC tried to do what you're suggesting. It failed. It upset people who need weak typing, because it was too strict and didn't match the old behaviour, and it didn't help people who need strict typing. The current RFC offers both weak and strict types because both are useful, and some people prefer one, some prefer the other. I don't want a compromise that pleases nobody. I want a compromise that gives both groups what they want.

I previously posted a case study of me porting my little web app to use strict scalar types everywhere. Weak types would only have helped me with one of the issues that strict hints helped me spot - and that's only if they were significantly tightened. Strict types have their place.

@bwoebi
Copy link
Author

bwoebi commented Feb 11, 2015

As I said on twitter:

I care about BC. I also sometimes underestimate impact. But sometimes… progress requires a step backward (in BC). ~ https://twitter.com/bwoebi/status/565648405628977153

The RFC didn't fail either, it was withdrawn. There were a few loud voices and a few points you couldn't exactly agree how to handle them best. Really, we just need a clear and simple set of rules what works.

I'm well aware that strict types may have their benefits, but do they really have their place in PHP? I don't know. But what definitely is a bad idea, is working around without fixing the root issue. Fix the issues from core up, not establish the workaround first and later notice that it is too late to remove it.
When we have fixed the issues I described in my post, then we can discuss if really we need strict types.

I'm sure a lot of people only voted yes on the RFC because it is the best they can currently get. Not because they really want strict types.

@bramus
Copy link

bramus commented Feb 12, 2015

As someone who's not familiar with how the voting system in the PHP community actually works I am wondering why your argumentation made here wasn't noted somewhere along the RFC before the vote started?

Above that I see that the voting system only allows for "yes" or "no", but not a "no, but ..." option – an option you would have voted for so it seams. Therefore I'm wondering if in case the vote results in a "no", does that mean that the whole idea is scrapped (and cannot be re-suggested), or does it mean "the vote does not let the RFC pass in its current state", allowing a new vote at a later time? (I guess it's the latter, as there's a follow-up vote suggesting to reserve the words in case the main vote fails)

@flaupretre
Copy link

Hi,

What's you opinion on what we should allow in numeric strings ? leading blanks, leading zeroes, trailing blanks (just blanks not any char) ? The current STH RFC removes support for all of this but I am not sure it shouldn't be kept...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment