Undefined behavior & sequence points
Assume the following code snippet of test.cpp. Our goal was to read integers from stdin and map them to the order in which they first appeared in our input. Keep in mind this is just a test and probably there are better ways to implement the intended behavior of this cut-down version.
If we use
g++ to compile our code we get the following output. This was not
the output we were expecting.
$ g++ --version g++ (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010 $ g++ test.cpp $ ./a.out 123 -> 1 42 -> 3
If instead we use
clang++ we get the expected output.
$ clang++ --version Ubuntu clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2) Target: x86_64-pc-linux-gnu Thread model: posix $ clang++ test.cpp $ ./a.out 123 -> 1 42 -> 2
Did we just find a bug in
g++? There is a bug, but not in the compiler. The bug is in
our code that invokes undefined behavior.
In a nutshell, an undefined behavior in C or C++, is when the standard does not describe what should happen at certain points of your code. What will eventually happen during runtime, highly depends on the implementation choices made by the compiler. More about undefined behavior (and its two less dangerous brothers, unspecified behavior and implementation-defined behavior) can be found in this stackoverflow post.
The problem is related to another notion as well. That of sequence points. There is a great description of sequence points in C and C++ in this stackoverflow answer. The Standard states
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Some examples include the semicolon (
;) at the end of a statement or the and operator in a condition (
a && b).
Another interesting case is that of a function call (after the evaluation of all function arguments (if any)
and before execution of any expressions or statements in the function body). Also, keep in mind that in C++ you
can overload operators. In that case, the operator designates a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
How are those two things related to our problem? Again from the Standard (§5/4)
- Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
- Furthermore, the prior value shall be accessed only to determine the value to be stored.
The second rule means that if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
So expressions like
i = i + 1 are legal while expressions like
printf("%d %d", i,++i); invoke undefined behavior.
Now we can look at our little bug once more.
From the documentation of
If k does not match the key of any element in the container, the function inserts a new element with that key and returns a reference to its mapped value. Notice that this always increases the container size by one, even if no mapped value is assigned to the element (the element is constructed using its default constructor).
Let's look at our code snippet.
mymap = mymap.size() + 1;
This statement is actually a function call, since we assing to a map object and the overloaded assign operator is invoked.
operator=(mymap, mymap.size() + 1);
But since there is no sequence point between the arguments, there is no guarantee on the order those arguments will be evaluated in.
It seems that
g++ evaluates arguments from left to right while
clang++ does the opposite.
Neither is the "correct" implementation. Our code should have avoided undefined behavior to begin with.