Skip to content

Instantly share code, notes, and snippets.

@satyx
Created June 19, 2022 19:17
Show Gist options
  • Save satyx/99ffdf0b82ffcc9418e4e88ab375186c to your computer and use it in GitHub Desktop.
Save satyx/99ffdf0b82ffcc9418e4e88ab375186c to your computer and use it in GitHub Desktop.

The need of Header Files

Introduction

Almost everyone who has even written a hello world program in C/C++ must have used header files using the include directive (remember how you imported stdio.h or iostream.h or used custom header files). But why do we actually need header files? Obviously to use predefined libraries whenever we want and don’t have to write all the crap everytime we start using it. Hmm yes, that’s true, but do we really need a specific file for that, why can’t we simply import the .cpp/.c file wherever it is required, I mean that would entirely bring the body of the function in our source code, thus we can simply use it. The answer to this question might seem extremely obvious to some of us but let us try to explore the seriousness of it step by step.

Compilation

Anyone who has written even a simple makefile would know but for the sake of completion, compilation process consists of multiple steps:

  1. Preprocessor: First the preprocessor preprocesses all the code before compilation takes place, which includes in layman's term copying and pasting entire code from header files, removing comments, expanding all the macros etc.
  2. Compiler: Now comes the crucial part, the compiler compiles entire code after step one to finally generate an assembly code which also includes making necessary optimisations (unless we prevent it via gnu compiler flags).
  3. Assembler: Now the assembler generates an object file corresponding to the assembly code received.
  4. Linker: Finally, linker generates the executable using the object files received from the assembler.

Now, let’s consider compiling multiple files to understand why we really need a linker. x.cpp

int foo(){
    return 2;
}

y.cpp

#include <iostream>

int foo();    // forward declaration

int main(){
    std::cout<<foo()<<std::endl;
    return 0;
}

For compiling,

g++ x.cpp y.cpp

Till Step 3, all the steps will be individually carried out for each file. So, the compiler simply generates the object file for x.cpp but while compiling y.cpp it will encounter that we are calling foo() but using the forward declaration we are making the compiler believe that there exists such a function and it doesn’t have to worry about it. The compiler just cares about the signature of the function (that it requires no arguments and the return type of the function is int. It just knows that it’s gonna receive an integer which eventually will get printed to the console, that’s it) and would thus generate an object file corresponding to y.cpp. Now the linker comes into action and it will take the object file of x.cpp and y.cpp and generate an executable having the machine code.

Note that now this executable has the instruction corresponding to the function foo() which was absent in the object file of y.cpp.

Another important thing to note here is, the order of compilation didn’t matter. Why? Because the object file corresponding to both the files are generated individually and it didn’t involve the requirement of the other (well we did include iostream header file and that requires the existence of the header file of iostream but in our case we will we just focus on two files x.cpp and y.cpp). After this step the linker generates the executable and at this point it’s gonna have all the object files.

Header Files

Now we simply had one function and we only had to write a single forward declaration. What if there are hundreds and thousands of them? Not only would it be inconvenient for us to write all the forward declarations, it would be prone to human errors. Or what if there are multiple files which require all declarations? What if at some point we gotta modify the signature of some functions to meet any new requirement(s), then we need to modify all the other files where its forward declarations are present? Thus to save us from this evil, here comes our hero into the picture, the header files. We can simply write all the forward declarations into it and include it wherever required.

x.h

int foo();

x.cpp

int foo(){
    return 2;
}

y.cpp

#include <iostream>
#include “x.h”

int main(){
    std::cout<<foo()<<std::endl;
    return 0;
}

Now while generating the object file for y.cpp, it will first copy-paste the forward declaration from x.h into y.cpp and then follow the rest of the steps. Convenient, isn’t it?

Small Twist: Including cpp file instead of header

Coming to the question, what if we rather than including the header file we include x.cpp. Now we're gonna have the function body present in the source code itself and don’t really need a header file.

y.cpp

#include <iostream>
#include “x.cpp”

int main(){
    std::cout<<foo()<<std::endl;
    return 0;
}

Firstly, can we do that? I mean it hurts the moral ethics all together, we never did this before, I mean importing an entire cpp file seems weird. True, but OUR (and let me emphasise only ours) purpose will be served and the program will get compiled successfully. As I mentioned, include simply will bring whatever is present in the mentioned file into the current one before further proceeding for compilation. So y.cpp would look somewhat like this

…
int foo(){
    return 2;
}

int main(){
    std::cout<<foo()<<std::endl;
    return 0;
}

Infact we now don’t even need to compile x.cpp to generate it’s object file since it’s entire piece of code would automatically get imported into y.cpp. Now we can simply compile it using the following command

g++ y.cpp

Is it good? NO !!! And that has multiple reasons:

  1. Everytime we change just the source code of y, we unnecessarily have to compile the source code of x.cpp (because it is getting included) making the compilation process long. Rather we can and we should and in fact we must have a makefile which will only generate object file of x if we have changed it. Thus having a header file saves us from this trouble.
  2. Most importantly, this might generate multiple errors. How? Let us have a third file z.cpp that requires functions defined in x. Now if we would have done the crap of including the cpp file, while generating the object file for y and z, both of them will have the function body of foo(). And ultimately, while generating the executable the linker will be unable to decide which of the functions (better should I say, which set of instructions) should be invoked. Boom!!! You will receive an error stating the presence of multiple definitions of foo.

Therefore rather than including the entire definition, we should use the forward declaration and hence we should use header files and put all the forward declarations in it.

What if we use header guards in the cpp file and simply just include them rather than going for header files itself? Please note that this error isn’t generated by the compiler but by the gnu’s elegant linker ld. To check, use the -c flag to generate just the object files

g++ -c x.cpp y.cpp z.cpp 

See no error! The error was generated by the linker which tried generating the executable. The header guards aren't gonna help when the linking process takes place, in fact it comes into action in the first step itself, the preprocessing. Thus it might prevent getting the same definition included multiple times in the same file (like including x.cpp in p.cpp and q.cpp and then including both p.cpp and q.cpp into y.cpp), it isn’t going to prevent getting it imported once into multiple files

Conclusion

Long story short, header files help us to write all the forward declarations in one place and then it can be included into multiple files. For the above mentioned reason, one should not write the entire function body in the header files. But what about inline functions? Why is it recommended to keep an identical inline definition of the function in the header file? Since it is important for the compiler to know before generating the object file about the entire definition, one should keep the identical inline function definition in the header body.

Note that inline functions have internal linkage, thus it can co-exist with a function with the exact same name and signature in a translation unit. Now when calling the function, it is upto the compiler if it wishes to use the inline version or the regular function.

For the same reason, using headers prevent the compiler from performing few optimisations as well like when importing a constant external variable (like extern const int x; int y = x*3; Here y will be calculated at the runtime which otherwise would have been initialised at compile time), or even using it to initialise constexpr because it requires the value beforehand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment