Skip to content

Instantly share code, notes, and snippets.

@gkastrinis
Last active July 8, 2018 11:16
Show Gist options
  • Save gkastrinis/22c27578cfa093837880 to your computer and use it in GitHub Desktop.
Save gkastrinis/22c27578cfa093837880 to your computer and use it in GitHub Desktop.
Unspecified (or undefined) behavior & sequence points in C++

Unspecified (or undefined) behavior & sequence points

Assume the following code snippet of test.cpp. Our goal was to read integers from stdin and map them to the order in which they first appeared in our input. Keep in mind this is just a test and probably there are better ways to implement the intended behavior of this cut-down version.

If we use g++ to compile our code we get the following output. This was not the output we were expecting.

$ g++ --version
g++ (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
$ g++ test.cpp
$ ./a.out
123 -> 1
42 -> 3

If instead we use clang++ we get the expected output.

$ clang++ --version
Ubuntu clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ clang++ test.cpp
$ ./a.out
123 -> 1
42 -> 2

Did we just find a bug in g++? There is a bug, but not in the compiler. The bug is in our code that invokes unspecified behavior.

In a nutshell, an unspecified behavior in C or C++, is when the standard does not describe what should happen at certain points of your code. What will eventually happen during runtime, highly depends on the implementation choices made by the compiler. More about undefined behavior (and its two more and less dangerous brothers, undefined behavior and implementation-defined behavior) can be found in this stackoverflow post.

The problem is related to another notion as well. That of sequence points. There is a great description of sequence points in C and C++ in this stackoverflow answer. The Standard states

At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)

Some examples include the semicolon (;) at the end of a statement or the and operator in a condition (a && b). Another interesting case is that of a function call (after the evaluation of all function arguments (if any) and before execution of any expressions or statements in the function body). Also, keep in mind that in C++ you can overload operators. In that case, the operator designates a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.

How are those two things related to our problem? Again from the Standard (§5/4)

  1. Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
  1. Furthermore, the prior value shall be accessed only to determine the value to be stored.

The second rule means that if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.

So expressions like i = i + 1 are legal while expressions like printf("%d %d", i,++i); invoke undefined behavior.

Now we can look at our little bug once more.

From the documentation of map::operator[]

If k does not match the key of any element in the container, the function inserts a new element with that key and returns a reference to its mapped value. Notice that this always increases the container size by one, even if no mapped value is assigned to the element (the element is constructed using its default constructor).

Let's look at our code snippet.

mymap[42] = mymap.size() + 1;

This statement is actually a function call, since we assing to a map object and the overloaded assign operator is invoked.

operator=(mymap.at(42), mymap.size() + 1);

But since there is no sequence point between the arguments, there is no guarantee on the order those arguments will be evaluated in.

It seems that g++ evaluates arguments from left to right while clang++ does the opposite. Neither is the "correct" implementation. Our code should have avoided undefined behavior to begin with.

#include <iostream>
#include <map>
using namespace std;
int main() {
map<int, int> mymap;
mymap[123] = 1;
mymap[42] = mymap.size() + 1;
cout << 123 << " -> " << mymap[123] << endl;
cout << 42 << " -> " << mymap[42] << endl;
}
@shafik
Copy link

shafik commented Oct 27, 2017

Nice post! It is always good to see people trying to educate others about undefined behavior.

The case with map though is not undefined behavior, it is more benign unspecified behavior. Although still probably not intended and easy to fall into. See my Stackoverflow answer to "Order of evaluation of assignment statement in C++" for more details.

Note, in C++11 the language changed and it no longer uses sequence points but ordering of evaluations: Sequenced before, unsequenced and Indeterminately sequenced. This case is still unspecified behavior before C++11 but the quotes from the standard are different. In this case what we care about is that there is a sequence point after all the function arguments are evaluated. In C++03 §[intro.execution] says

When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body. [...]

@gkastrinis
Copy link
Author

Thanks for your comment! I tried to rectify the text a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment