public
Last active

Analysis of some weird evaluation order in PHP

  • Download Gist
php_evaluation_order.md
Markdown

Order of evaluation in PHP

Yesterday I found some people on my favorite reddit wonder about the output of the following code:

<?php

$a = 1;
$c = $a + $a++;
var_dump($c); // int(3)

$a = 1;
$c = $a + $a + $a++;
var_dump($c); // int(3)

As you can see the expressions $a + $a++ and $a + $a + $a++ have the same result, which is rather unexpected. What's happening here?

Operator precedence and associativity

At this point many people seem to think that the order in which an expression is evaluated is determined by operator precedence and associativity. But that's not true. Precedence and associativity only tell you how the expressions are grouped:

<?php

// in the first expression
$a + $a++;
// "++" has higher precedence than "+", so "$a++" is grouped:
$a + ($a++);

// in the second expressions
$a + $a + $a++;
// "++" again has higher precedence than "+":
$a + $a + ($a++);
// and "+" is a left-associative operator, so the left "+" is grouped:
($a + $a) + ($a++);

What does this tell us about the order of evaluation? Nothing. Operator precedence and associativity specify grouping, but they do not specify in which order the groups are executed. In the last example either ($a + $a) or ($a++) could run first.

PHP does not specify what will actually happen. One version of PHP can give you one result and a different version another. Don't write code that depends on some particular evaluation order.

CV optimization

Even though PHP does not define an order, it would still be interesting to know why you get that rather odd result in the first code sample (this result is consistent across all recent PHP versions).

The reason behind it is the "compiled variables" (CV) optimization that was introduced in PHP 5.1. This optimization basically comes down to allowing simple variables (like $a, but not $a->b or $a['b']) to directly act as operands of an opcode. (Opcodes are what PHP generates from your script and what the Zend VM executes. Every opcode has at most two operands and an optional result.)

Now, lets look at the opcodes generated by the two code snippets. We'll start with $a + $a + $a++:

// code:
$a = 1;
$c = ($a + $a) + ($a++);

// opcodes:
         ASSIGN   $a, 1
$tmp_1 = ADD      $a, $a
$tmp_2 = POST_INC $a
$tmp_3 = ADD      $tmp_1, $tmp_2
         ASSIGN   $c, $tmp_3

The generated opcodes should be rather intuitive: First assign $a = 1, add $a + $a and store the result in $tmp_1, then perform a post-increment on $a and store the result in $tmp_2, then add both temporary variables and assign the result to $c.

The evaluation here happened left-to-right (first $a + $a was run, then $a++) as you would probably expect. Now let's look at the $a + $a++ case:

// code:
$a = 1;
$c = $a + ($a++);

// opcodes:
         ASSIGN   $a, 1
$tmp_1 = POST_INC $a
$tmp_2 = ADD      $a, $tmp_1
         ASSIGN   $c, $tmp_2

As you can see, in this case the POST_INC ($a++) happens first and the value of $a is only read after that in the ADD opcode. Why? Because reading the value of a variable does not require an extra opcode. Any opcode can handle reading the value of a simple variable. This is what the CV optimization does.

Avoiding the CV optimization

There are some (rare) circumstances in which the CV optimization is not performed, e.g. when the @ error suppression operator is in use.

Lets try it out. We use the $a + $a++ expression again, but this time prepend a @ before it:

<?php

$a = 1;
@ $c = $a + $a++;
var_dump($c); // int(2)

With the error-suppression operator present, the result suddenly becomes 2 rather than 3. To figure out why, lets look at the opcodes once again:

         ASSIGN        $a, 1
$tmp_1 = BEGIN_SILENCE
$var_3 = FETCH_R       'a'
$tmp_4 = POST_INC      $a
$tmp_5 = ADD           $var_3, $tmp_4
$var_2 = FETCH_W       'c'
         ASSIGN        $var_2, $tmp_5
         END_SILENCE   $tmp_1

Several things changed here: Firstly, everything is now wrapped in BEGIN_SILENCE and END_SILENCE opcodes for handling of @. Those are of no interest to us. Secondly, $a and $c are now fetched using FETCH_R (fetch for read) and FETCH_W (fetch for write) rather than being used directly as operands.

Because the fetch of $a now has an actual opcode, the fetch will happen before the increment and as such the result changes.

Takeaway

If you take anything away from this, let it be these two things:

  • Don't rely on order of evaluation within an expression. It is undefined.
  • @ disables CV optimizations and as such hurts performance. @ also hurts performance in other ways.

~nikic

Nice article (and thanks to the Reddit people to bring it up)!
How would you measure the impact (in performance) of using @ (or better said not using Compiled Variables)? Has that been measured and published somewhere? (do you know of any place with more information on that?)

Your "favorite" subreddit? Hah, thanks for the laugh, and thanks for fighting the good fight. Great write-up.

Great write-up. This explains so much!

very interesting, thanks a lot!

Thanks for the article. Never thought simple mathematics calculation will be this much complicated in PHP.

+1 for a simple explanation of something I've seen in the wild with PHP but never understood why. :)

Great article, thank you. But i have a question:

// and "+" is a left-associative operator, so the left "+" is grouped:
($a + $a) + ($a++);

Why the first $a grouped with the second $a, why not else? For instance if i have $a + $a + $a + $a + $a... how will created groups in this case? Thnx.

@khaletskiy "+" is a left-associative operator (that's specified in the operator precedence/associativity docs), which means that for operators of equal precedence (here both "+" operators have the same precedence) the grouping happens from left to right.

Examples for both cases:

$a + $a + $a + $a + $a
// + is left-assoc, so it's grouped as
(((($a + $a) + $a) + $a) + $a)

$a = $b = $c = $d = $e
// = is right-assoc, so it's grouped as
($a = ($b = ($c = ($d = $e))))

Hello,everyone ,i want to ask something about the "&" in php. Example:
<?php
$a=1;
$b=&$a;
echo ($a++)+(++$a); //the output is 4
?>

<?php
$a=1;
$b=&$a;
echo (++$a)+($a++); //the output is 5
?>

<?php
$a=1;
//$b=&$a;
echo (++$a)+($a++); //the output is 4
?>

<?php
$a=1;
$b=&$a;
echo (++$a)+(++$a); //the output is 6
?>

I want to ask: why the result is so different of the above cases??

Thanks for this excellent explanation. Must admit that I find it disturbing that PHP is diverging from the way that the increment operators evaluate in C (see http://codepad.org/Ca65faQ1). I understand that the CV makes things faster, but if it results in more confusion is the speed gain worth it?

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.