What worries me about the Scalar Types RFC

At the time of typing this and with only four days left, the scalar types RFC is barely being rejected by 66:34 (needs two thirds to pass). Some of the votes against are from folks that don’t want scalar type hints at all. I also think some people may be voting in favor on the basis of expecting the type hinting implementation to solve problems it’s not really meant to — type hinting is not validation (the “word” is not the “meaning”), and content-safety is only a side effect of type-safety.

I’ve been following the internals discussion, the threads on Reddit, and asking some questions around.
I think the proposal is clever and having scalar types would be great. You can go to Antony Ferrara’s blog to read about a lot of good reasons in favor of this RFC.

I still feel that in its current form (as far as I understand it) the proposal can add more effort and introduce more issues than expected, so I decided to do a write up my concerns.

1. Autocasting on weak mode

The proposal states that when a hinted function is called in weak mode, the arguments will be casted. The intent is that, if you were using functions and having them work without minding types, you’d still be able to do so even after adding hints.

In practice, this still means the behavior of the hinted function changes. Without hints, the function can still inspect the argument as-is.

  • If you pass "2" to an unhinted function that will use it as an integer, the function still has the change of calling gettype and see that it was given a string.
  • If you pass "2cats", the unhinted function can see the "cats".
  • If you hint the argument as int and call the function from a weak types context, the function cannot see the "cats", and the caller will see a Notice.

Now, normally you wouldn’t hint an integer if any of the functionality depended on reading string characters from that argument. But I know there’s going to be a temptation to replace validation checks with a hint, which guarantees the function will run, and then someone else using that function and relying on it stopping bad input will find themselves having to change their code.

  • Case #1: The function detected the invalid argument and returned null, which was then detected by the caller. After adding a hint, the function always successes. It sometimes throws a Notice for lossy casts, but the caller had no code in place to catch it.
  • Case #2: The function detected the invalid argument and threw an Exception, which was then catched by the caller in a try... catch block. After adding a hint, the function throws a Notice, which is not catched by try... catch.

Andrea Faulds has suggested adding a “number” type hint. If it casts only valid strings to numeric types (like is_numeric) and throws an Exception that can be catched by try... catch that would help. But adding a hint will still count as a code breaking change due to potential loss of precision.

Personally, I think I would prefer no autocasting to happen. Calling a hinted function in a weak context would completely ignore the hints. This means the callee’s author still has to write code to validate the arguments as if the hint wasn’t there, but this is OK since content-safety is not type-safety.

This doesn’t decrease the essential value of scalar type hints, which is to allow static analysis and optimizations; you can’t quite ask these tools to understand hand-coded type checks, so the hints are there for them. On a strict context, on the other hand, the hint would have the side-effect of validation, and render the check in the callee’s side redundant, and this is OK too.

The reason you may have redundant checks is we are allowing hints to be bypassed. In the current RFC, the weak caller will have to add a check before passing an argument to a hinted function, and that check is redundant if the mode is changed to strict. But I feel that check needs to be on the callee’s side and ship with it, because all weak callers are going to need it. Yeah, strict callers will still find themselves handling their arguments before passing, and then the callee still doing an unnecessary check, but unless we can guarantee both caller and callee will always be the same library, that’s not so strange or undesirable.

And there’s also the consideration that library coders are going to be typically more skilled than library users, so passing all the responsibility to the caller is not ideal.

2. Lossy casts throwing notices

This is a simple concern. If autocasting were to remain in, I think lossy casts should throw fatal errors catchable with try... catch, to prevent safety issues remaining unhandled during the transition to PHP 7.

3. Strict mode declaration

Currently strict mode would be set on a per-file/per-block basis with declare statements. This is very likely to change later.

This would be less of an issue without autocasting, because the functions would need to be resilient anyway to context switching. With autocasting in, adding a declare is a potentially breaking change. In that case, I would very much prefer that weak calls were only enabled by something akin to a try_with_casts... catch block, which both allowed to catch lossy casts and made the caller aware that they are responsible now of the argument type casting.

I understand the position that passing this RFC could be the chance to get strict types in before PHP 7, and that it could just be refined later. But I fear parts of it won’t be seen as changeable once code has been written to fit it; people will look at the hinted functions and say “we can’t switch content-safety back to the caller-side, think of all the libraries that replaced validation for hints”. So even though the RFC is very well thought (otherwise it wouldn’t have the support of folks like Anthony Ferrara or Phil Sturgeon), maybe it would be better to risk PHP 7.0 not having scalar hints.

If it passes, component/library writers could still adopt as a good practice to always expose only APIs with unhinted scalar arguments and handle casting themselves. If the scalar types behavior was to change later it will be easier to fix.

Rust and semantic overload

A discussion on Rust and readability made me wonder why I feel weirded out by some of the code. I went to browse hyper‘s github and stared at:

pub fn new(mut stream: &'a mut (Reader + 'a), addr: SocketAddr) -> HttpResult<Request<'a>> {

There’s hardly anything more common than a two-arguments method, yet somehow it felt (to someone coming from dynamic languages) a bit like code golf. My particular issue was that the explicitness adds a lot of semas to the declaration:

“Public function new, with arguments: stream, which is a mutable borrowed reference with the ‘a lifetime and contains a Reader plus the ‘a lifetime; and addr, which is a SocketAddr, and must return a HttpResult of type Request with lifetime ‘a.”

That’s definitely a lot of different things to say in a single sentence. And then, if I’m not looking directly at it, the line doesn’t have the typical shape of anything because it has so many components.

Writing it in multiple lines makes a huge difference for me:

pub fn new
    mut stream: &'a mut (Reader + 'a),
    addr: SocketAddr
-> HttpResult<Request<'a>> {

It looks a bit unorthodox for a function declaration — but then, with all the explicitness, a declaration in Rust practically comes with half a docblock included, so it’s not really an outraging “waste of lines”.

I still kind of wish mutability wasn’t indicated with a separate term, but sigil soup isn’t all that desirable either.

Return types on PHP 7 voted (again)

A new vote on return types is ongoing.

Differences from Past RFCs

This proposal differs from past RFCs in several key ways:

The return type is positioned after the parameter list. See Position of Type Declaration for more information about this decision.
We keep the current type options. Past proposals have suggested new types such as void, int, string or scalar; this RFC does not include any new types. Note that it does allow self and parent to be used as return types.
We keep the current search patterns. You can still search for function foo to find foo’s definition; all previous RFCs broke this common workflow.
We allow return type declarations on all function types. Will Fitch’s proposal suggested that we allow it for methods only.
We do not modify or add keywords. Past RFCs have proposed new keywords such as nullable and more. We still require the function keyword.


Discussion on Reddit.