Undefined, implementation defined and unspecified behaviors in C++

Introduction :

C++ standards documentation define how your code should behave. However if you look at it, you will also notice undefined behaviours. That practically means that how your code will behave in specific conditions will be defined by the standardised specifications and also implementation of the compiler you are using :


Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuanceof a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message )

Undefined behaviors exist to allow C and C++ compilers to optimise the generated assembly. You can read about its advantages in LLVM blog :

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

And another explanation here : https://monoinfinito.wordpress.com/2016/06/02/c-why-is-undefinedness-important/

The most typical examples are uninitialised variables, dereferencing a null pointer , integer overflows and various wrong type casts and more. As for a general list of undefined behaviors , here is the best collection I could find :

http://stackoverflow.com/a/367662

Undefined behavior vs Unspecified Behavior vs Implementation Defined Behavior

If you look at the standard documentation more , you will also notice that there are 2 more terms : unspecified behavior  and implementation-defined behavior . Here you can see a stackoverflow question and answers to it regarding the differences :

http://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior

Here is the definition of implementation specified behavior :

behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents.

The very last part of the standard documentation has a list of implementation defined behaviors. And here is the unspecified behavior :

behavior, for a well-formed program construct and correct data, that depends on the implementation
Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard.

Briefly we can say that a code that has undefined behavior has errors. On the other hand an implementation defined behavior are behaviors implemented by compiler vendors such as sizeof int  and it has also has to be documented by the compiler and standard library vendor. Finally an unspecified behavior is like implementation defined but there is no need to document it such as the order in which the arguments to a function are evaluated.

An undefined behaviour example which is handled differently by MSVC and GCC :  In the example below , we first do upcasting via pointers. After upcasting , it is assigned to a derived class pointer and the method of derived class called successfully. However,after that, we make the base pointer to point to a base object and re-assign it to derived pointer. And if you call a derived class method via a derived class pointer that actually points to a base object , that is an undefined behaviour. If you try the example below, it will work fine in MSVC ( probably in the example there is no special implementation ), however if you compile it with GCC it will cause a segmentation fault :

Interesting consequences of undefined behaviors

An interesting example is that a variable can be both true and false due to an uninitialised variable issue which is a quite common  undefined behavior :

http://markshroyer.com/2012/06/c-both-true-and-false/

Another interesting example is from Linux kernel : https://isc.sans.edu/diary/A+new+fascinating+Linux+kernel+vulnerability/6820

What happens in that one is :

1. First there is initialisation of a variable and since it is a pointer it can be NULL.

struct sock *sk = tun->sk; // initialize sk with tun->sk

2. Then there is a null check and if it is NULL , the code has to return an error code :

if (!tun)
return POLLERR; // if tun is NULL return error

3. However , the compiler optimised the code , noticing that the variable being initialised and eventually it deletes the NULL check

4. This can lead the kernel to read/write data from 0x00000000 which can be mapped to user land memory.

5. And here you can see how it can be exploited :

Ubsan : Undefined behavior sanitizer

GCC ( starting from 4.9) and Clang , have “ubsan” ( undefined behavior sanitizer ). If you build with -fsanitize=undefined flag , undefined behaviors will be reported in runtime.

Here is the simplest example of dereferencing a null pointer , when you compile and execute it , you will get a segmentation fault/memory access violation :

http://coliru.stacked-crooked.com/a/001178588381592e

Eventually if you add -fsanitize=undefined : http://coliru.stacked-crooked.com/a/d4c9040456f31156

This time you will also get :

“main.cpp:7:14: runtime error: load of null pointer of type ‘int'”

In another example on http://coliru.stacked-crooked.com/a/fe6bcd958be237ac , we have a function that doesn`t return a value. Ubsan reports it as :

main.cpp:1:5: runtime error: execution reached the end of a value-returning function without returning a value

As for an example use case  , here you can see that how it is used for testing Clang`s libc++ :

https://cplusplusmusings.wordpress.com/2013/03/26/testing-libc-with-fsanitizeundefined/

FURTHER RESOURCES

A nice stackoverflow answer regarding types of UBs : https://stackoverflow.com/questions/367633/what-are-all-the-common-undefined-behaviours-that-a-c-programmer-should-know-a

Here is a one and a half hour video from BoostCon about undefined behavior : https://www.youtube.com/watch?v=uHCLkb1vKaY&index=75&list=UU5e__RG9K3cHrPotPABnrwg

A CPPCon video : https://www.youtube.com/watch?v=g7entxbQOCc

A CPPCon video presented by Chandler Carruth : https://www.youtube.com/watch?v=yG1OZ69H_-o

A great article about UB in C++ : https://blog.regehr.org/archives/1520?utm_source=newsletter_mailer&utm_medium=email&utm_campaign=weekly

Advertisements

One thought on “Undefined, implementation defined and unspecified behaviors in C++

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s