PHP is usually included in the top five or six most popular programming languages, as measured by various metrics implemented by e.g. Tiobe, LangPop, PYPL, lang-index. Alongside it sit C, Java, Obj-C, C++, C#, Javascript, and Python. All of these have a formal semantics or at least a rigorous specification. C has ANSI and ISO specifications, much work on formal semantics, and even a formally verified compiler. Java has a language specification and a formal subset, "Featherweight Java". Objective-C has some specification in the form of its C subset, and decent documentation. C++, similarly, has C as a formally defined subset, is defined in an ISO standard and has some work on formalizing fragments of it. C# has an ECMA standard and at least one paper formalizing it. Javascript is really ECMAScript, which has a specification, and some work on the essence of Javascript formalizes it and builds a reference interpreter. Python has an operational semantics.
PHP is notably different. It has no specification other than an informal and sparse "language reference". It is said to be defined by a reference implementation: the complex and optimized Zend interpreter, written in C.
Many language features can be understood as syntactic sugar. This creates a smaller core language, with fewer syntactic forms to which we must assign semantics.
All variables are looked up dynamically in the environment. This can be done dynamically: if variable "x"
maps to value v
in the environment, and expression e
evaluates to "x", then the expression ${e}
evaluates to v
. Variables can also be assigned dynamically: if e1
evaluates to "x"
, then the expression ${e1} = e2
assigns the value of e2
to the variable at "x"
.
This means that $x
can be understood as syntactic sugar for ${'x'}
, both as an expression and as the target of an assignment (or reference assignment).
As well as $x
, PHP variables can take the forms $$x
, $$$x
, etc, and even $$${'x'}
. The $
in this case is a prefix operator and associates to the right, like $($($x)))
. The form $$$x
is sugar for ${${${'x'}}}
.
Several control structures allow one to omit braces; this is purely syntactic:
if (e) s1; ==> if (e) { s1; }
if (e) s1; else s2; ==> if (e) { s1; } else { s2; }
while (e) s1; ==> while (e) { s1; }
etc.
This is just sugar; it associates to the right:
if (e1) { s1; } else if (e2) { s2; } else { s3; }
==>
if (e1) { s1; } else { if (e2) { s2; } else { s3; } }
PHP provides a keyword elseif
, which is semantically identical to else if
. (The Zend implementation apparently optimizes it differently, but this is unimportant.)
The statement if (e) { s1; }
is just sugar for the more general if (e) { s1; } else {}
.
do-while should be understood as primitive, and not as
do { s; } while (e); ==> s; while (e) { s; }
since the block may contain break
statements which skip to the end of the loop.
Since code blocks do not introduce a new scope:
for (e1; e2; e3) { s; } ==> e1; while (e2) { s; e3; }
The manual implies that both forms of foreach
are just sugar in terms of reset
(??) and each
(an internal function):
foreach ($a as $v) { s; } ==> reset($a); while (list( , $v) = each($a)) { s; }
foreach ($a as $k => $v) { s; } ==> reset($a); while (list($k, $v) = each($a)) { s; }
The double-quoted string can be syntactically transformed into an expression using only single-quoted strings, e.g.
"foo $bar baz" ==> 'foo ' . $bar . ' baz'
(This step is indeed taken by the Zend interpreter before execution.)
PHP's mechanism for escaping should be seen as echo
in disguise:
?>foo bar baz<?php ==> echo 'foo bar baz';
?>foo bar baz[EOF] ==> echo 'foo bar baz';
This has one niggle: the PHP file starts out in escaped mode, so the file can simply be understood to start with a ?>
; i.e., the first statement is always an echo
. Thus, we can compile out the escaping to a language without escaping:
#!/usr/bin/env php
blah
<?php
echo 'foo';
?>
==>
echo "blah\n";
echo 'foo';
echo '';
This is indeed how Zend compiles the escaped text.
PHP is described as an "interpreted language". However, this is a misnomer, and PHP can be compiled:
- The Zend "interpreter" compiles a PHP file to bytecode known as "opcode" before execution. The compiled op-codes are pleasingly short. Install the "Vulcan Logic Dumper" PHP extension and run
php -dvld.active=1 -dvld.execute=0 file.php
in order to view the compiled opcodes. However, the opcodes are extremely underspecified. - [HipHop Virtual Machine] compiles PHP to HipHop bytecode (HHBC), which has a much better specification. I don't know if it's possible to view the HHBC for a given file, though.
PHP can therefore be understood by specifying the compilation step, then specifying the semantics for the bytecode.