The last 2 weeks of 8LU were very interesting. Steven Degutis delivered 2 great talks on programming languages, covering the pros and cons of a wide variety of features that we see in programming languages today. Though I agree with pretty much everything that he had to say, I was not all too thrilled when I heard him say that C++ is slow.
What Steven was referring to was the long and dreadful compilation process that some C++ project exhibit. But why is it that some C++ programs (that are relatively small) take hours to compile? Well, what if I ask you this question: why do some SQL queries (that operate on relatively small datasets) take 30 seconds to run? How about this: why does a 50-piece jigsaw puzzle take forever to complete when one attempts to solve it by jamming random pieces together? The answer to the last question should be fairly obvious; the answer to the first two should be compiling in your head right now.
Most every task can be completed in numerous ways. I can brush my teeth starting with the bottom row, then move on to the top row, and finito. Or if I don't want the germs from the top row falling onto my already-cleaned bottom row, then I can start with the top row and finish with the bottom row.
There are also many ways to structure a C++ project (or a project in any language for that matter). You can be smart about managing your dependencies and have a relatively quick compile time, or you can be lazy and always complain about your compile time. What I will propose here are just a few methods that can help you go fast(er) on your next C++ project.
Forward Declarations
Let's take a look at a simple example. I have a class, Programmer
, that uses a Laptop
. A little something like this (file programmer.h):
Try to compile this and you'll get a "Laptop not declared"
error (or something along those lines). How can we fix this? Include the laptop.h header which should declare your Laptop
class.
Error gone - beautiful, what's next?
Not so fast. Is that really beautiful? All your're doing is declaring a method that takes a Laptop
object and nothing else. You're not using the laptop
in anyway - you just need to know that it exists. So are you really going to include that entire laptop.h file and be done with it? This means that, each time one of your clients includes your programmer.h, they will also automatically include laptop.h. And if laptop.h does something similar (includes another file, that includes another file, etc.), you can see this recursion leading to a point where pretty soon you're loading and parsing 50 or 100 files just to be able to use one class. Then we begin to wonder, "Why does this compile so slow?"
Alternatives? Well, for something as simple as a method declaration where we don't need to know anything about Laptop
(other than it exists) all that we really need to do is tell the compiler that "Hey, this class really exists - for real", and you can do that by forward declaring the class at the top of your file, like so:
That's all folks! Now whenever the compiler goes through your programmer.h file, he'll see the Laptop
class declaration at the top and won't need to load any other files. When you or someone else need to include this programmer.h file, the compiler won't have to go and fetch any other files.
No Other Way Out
When is a forward declaration not possible and we absolutely must include that laptop.h header file? One situation is when we need to call methods on laptop. Another is when we have a situation where the compiler needs to know the exact size (in bytes) of a Laptop
object.
The first scenario - if you're defining your functions somewhere and your functions are calling methods on Laptop
, then you need the entire laptop.h file to verify the signatures of the methods you want to call. So inside our programmer.cpp, where we are defining writeSoftwareOn
, we would have something like this:
Your implementation may vary, but the thing to get out of this is that the compiler needs to read the laptop.h file so that it can make sure that Laptop
does in fact have a public method called hasInstalled
that has one parameterstd::string
(or const char*
) and returns a boolean
. If you would not be calling any methods on Laptop
here then you would not need to include laptop.h. Forward declaring Laptop
in your programmer.h file would be sufficient.
Moving forward - another case where you will need to include a header file of another class is when the compiler needs to know how much memory an instance of that object will occupy. The simplest case is when you compose an object inside another object by value.
Example:
Since laptop
is not a reference or a pointer, it is the actual Laptop
object itself, any time you want to instantiate a Programmer
object the compiler will reserve enough memory to store the Laptop
along with it. So, for example, if an instance of Laptop
needs 24 bytes of memory, and an instance of Programmer
needs 8 bytes, every time an instance of Programmer
class is created the OS will be asked for a chunk of memory that's 32 bytes - the Laptop
will be stored in 24 of those bytes. To reiterate, the reason why the header file for Laptop
is necessary is so that the compiler can look at what's "inside" a Laptop
and determine how many bytes it will need to store one of those alongside a Programmer
.
If you're composing by reference or pointer, the composed object will live elsewhere (not alongside your object) and the compiler simply needs to allocate enough memory to store the reference data type (which is the same for all classes) and does not need to know the size of the composed object.
Example:
The forward declaration of Laptop
is sufficient because the instance variable laptop
on <code>Programmer</code>
is a reference type. The actual Laptop
object will be created elsewhere and the reference must be set in Programmer
's constructor. The main point to get here is that whether the reference class is a Laptop
or Bear
or SomeOtherClassThatYouDefine
, it will always take up the same amount of space, which the compiler knows ahead of time and can do the right thing without the entire definition. Same thing goes for pointer types.
Free is Good
Two 8th Lighters walk into a bar. Upon sitting down, the bar tender hands each of them a glass of water and asks them what else they would like. 8th Lighter #1 orders a Long Island Iced Tea and 8th Lighter #2 orders a glass of water. Which 8th Lighter will get their drink first?
How about this one: your project has 2 source files - laptop.cpp and programmer.cpp. You compile both of them, run your tests, then make some changes to programmer.cpp. Which files must you re-compile before running your tests again?
The answer should be fairly obvious: only programmer.cpp. Nothing changed in laptop.cpp, so what do you get out of re-compiling the same piece of code? When compiled, each class is spit out into a binary object file. After that, all your object files are linked together and assembled. That is a separate process on its own that still needs to happen, but if we can skip the compilation process of certain files then why not take advantage of that?
Avoiding useless re-compiling is not an easy task accomplish. You need to be able to identify which source files already have their corresponding object files built and if those object files are newer than the source files. Furthermore, if you want to avoid re-compiling a .cpp file, you need to make sure that all the .h files that it depends on have not been modified since your .cpp was last compiled. Why is that? Let's go back to the Programmer
and Laptop
example from before, where Programmer composed Laptop by value and we were forced to include laptop.h inside programmer.h (and therefore programmer.cpp). In this example, even if you made no changes to programmer.cpp but you touched laptop.h you must still recompile programmer.cpp because it depends on laptop.h. The compiler can't tell whether or not the changes in laptop.h will affect the output of compiling programmer.cpp (were functions deleted? did Laptop
's size change?) so to be safe it must recompile both files.
At this point, you may be seeing another advantage of forward declarations. Had we not had the need to include laptop.h inside programmer.cpp, then we would have avoided the need to re-compile programmer.cpp because of a change in laptop.h. You may also be seeing how keeping your classes loosely coupled and highly cohesive helps you out here. If your classes don't depend on the entire world, then they will need to be re-compiled less frequently. If your classes are highly cohesive, then changes in your software will be localized to just a few places and not trigger an avalanche of re-compiles with every change.
This process of "compiling only if something changed" is not automatic. If you're on the command line and throw a bunch of .cpp files at the compiler, it will compile all of them; it won't try to figure out dependencies for you. You will have to use a build tool like make
to explicitly manage your dependencies. If you don't want to learn make, spend an evening or two with your IDE and investigate what options you have there.
Pre-Compiled Headers
This is a nice solution that many compilers have to the problem of having to include the same set of header files in nearly every file you have. It's particularly useful to speed up the compilation of your test suite where each of your test classes must make use of some other classes defined in your test framework. If you have good tests, you're likely to have a lot of classes that depend on a bunch of header files from your test framework.
Now, chances are that each of your test files will end up including thousands of lines of code from a number of files in your test framework, just so that you can use that test framework. This can add up very quickly so even if you do a mighty fine job of managing your resoures your compile time will still shoot through the roof just because you need to include your test framework header files everywhere.
Solution? Compilers have an option of pre-compiling header files. Essentially how this works is the compiler compiles a header file (and all files included by that header file) into something that is faster and easier to process the second time around. So it compiles the header file once and whenever it comes across an include for that same header file again it will use the pre-compiled header (instead of going through 50 files and thousands of lines of code each time it sees that header included somewhere).
Methods for doing this vary per compiler, so I won't go into any more detail here. Just remember that something like this exists and that it is a big time saver when you have an external library, framework, or system header files that almost never change but you use in many places.
Compiler Optimiziations
Now that you've got all this extra time on your hands due to blazing fast builds stay with me for this one last, small, but nevertheless compile-time-saving tip. Most every C++ compiler in the world does some sort of optimizations. You can specify various levels of optimizations. These optimizations do take up extra time during the compilation process, so make sure to disable them (if they are enabled by default) when compiling frequently during development. By all means, do turn them on for production releases and for CI builds, but during development the amount of time that those optimizations will save you are not worth the extra time it takes the compiler to make them.
A Few Exceptions
Without going into any detail - when you start working with templated classes and functions all bets are off. Templates are compiled differently under the hood and the rules of the game are different. Different story, perhaps I'll cover that on a different day. But I will briefly add, if you do end up working with templates and they're killing your compilation time, spend some time researching pre-compiled headers for a possible solution.