taming-cpp-motivation.md (7321B)
1 # Taming C++, episode 1: motivation 2 3 *This post is part of a [series](../../series)* 4 5 C++ is pretty much the standard language in the software industry for 6 any project where performance matters. I have used it a fair bit in 7 the past - during my [IOI](https://ioinformatics.org) time and for 8 a university course - but I used it only as "C with 9 [STL](https://en.wikipedia.org/wiki/Standard_Template_Library)". I have 10 also reviewed it for my job interview a couple of years ago, but I have 11 not touched it since then. Given that I am at the early stages of my 12 career as a software developer and that I am interested in performance, 13 it's about time I learn modern C++! 14 15 ## Motivation 16 17 As much as I like programming and learning new stuff, this is something 18 that requires some effort. And it turns out that keeping a copy of 19 [A Tour of C++](https://www.stroustrup.com/tour3.html) on my desk and 20 looking at its cover from time to time does not provide sufficient 21 motivation to take up this new task. 22 23 Starting a new project would be a good way to start playing with a new 24 language, but at the moment I would rather keep working on my ongoing 25 projects than start a new one. Writing a (series of) blog post(s) 26 where I share my experience with the rest of the world seemed like a 27 good alternative, but I needed some extra push to get started. 28 29 Finally, a few weeks ago I have discovered Bert Hubert's series of posts 30 [*Modern C++ for C programmers*](https://berthub.eu/articles/posts/c++-1). 31 The first post in that series showed an example that got my attention: 32 for a task as simple as sorting a list of integers, using their respective 33 standard libraries, C++ outperforms C by a significant margin! 34 35 ## Performance 36 37 To get started with modern C++, I decided to repeat Bert's experiment. 38 You can find the code I wrote in 39 [this repository](https://git.tronto.net/taming-cpp). 40 41 To sort a list of a hundred million integers, in C we can use `qsort()` 42 from `stdlib.h` (see 43 [sort.c](https://git.tronto.net/taming-cpp/file/sort-benchmark/integers/sort.c.html)): 44 45 ``` 46 qsort(a, ARRAYSIZE, sizeof(int), compar); 47 ``` 48 49 Where `compar()` is a comparison function such as 50 51 ``` 52 int compar(const void *x, const void *y) { 53 return *(int *)x - *(int *)y; 54 } 55 ``` 56 57 In C++ we can use `sort()` from `algorithm` (see 58 [sort.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/integers/sort.cpp.html)): 59 60 ``` 61 std::sort(a, a+ARRAYSIZE, 62 [](const int &x, const int &y) { return x < y; }); 63 ``` 64 65 Here we use a 66 [lambda expression](https://en.cppreference.com/w/cpp/language/lambda) 67 instead of a comparison function. 68 69 Modern C++ also offers parallelized 70 version of common algorithms as part of the standard library, so if 71 we want to completely humiliate poor C we can use this (see 72 [sort_parallel.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/integers/sort_parallel.cpp.html)): 73 74 ``` 75 std::sort(std::execution::par, a, a+ARRAYSIZE, 76 [](const int &x, const int &y) { return x < y; }); 77 ``` 78 79 Parallelizing stuff like this looks almost like cheating, but it wasn't 80 without some struggle - see the "Gotchas" section below. 81 82 So, what is the result? These are the times I get on my 83 [Debian 12 desktop](https://sebastiano.tronto.net/blog/2023-10-15-build-time) 84 (AMD Ryzen 7 7700): 85 86 ``` 87 C time for 100000000 numbers: 9.815525s 88 C++ time for 100000000 numbers: 4.87299s 89 C++ time for 100000000 numbers (parallel): 0.488956s 90 ``` 91 92 Even without parallelization, C++ is twice as fast as C! 93 94 You might think that C++ is somehow optimizing for integer 95 sorting. But this is not the case, as demonstrated by a similar experiment 96 with *pairs* of integers (see 97 [sort.c](https://git.tronto.net/taming-cpp/file/sort-benchmark/pairs/sort.c.html), 98 [sort.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/pairs/sort.cpp.html) and 99 [sort_parallel.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/pairs/sort_parallel.cpp.html) 100 - I used a non-standard mixed lexicographic order to be reasonably sure the 101 compiler does not come up with any ad-hoc optimization): 102 103 ``` 104 C time for 100000000 pairs: 12.383896s 105 C++ time for 100000000 pairs: 6.50033s 106 C++ time for 100000000 pairs (parallel): 0.713616s 107 ``` 108 109 ## Complexity is not for nothing 110 111 As explained by Bert in his post, the reason 112 the C++ version is faster is that by using 113 [templates](https://en.cppreference.com/w/cpp/language/templates) 114 the compiler can inline the call to the comparison function. C is not 115 as sophisticated, so this is not possible: the only way a standard 116 library function can call your custom code is via a 117 [function pointer](https://en.wikipedia.org/wiki/Function_pointer). 118 Regardless of possible performance issues with pointer 119 dereferencing, calling a function without inlining it always causes 120 some overhead. This is completely negligible for larger tasks, but for a 121 small function that is called millions of times it makes a big difference! 122 123 And all of this without even considering the elephant in the room, that is 124 the fact that with virtually zero extra effort - but again, see the "Gotchas" 125 section below - one can parallelize C++ STL algorithms and make them an order 126 of magnitude faster on modern hardware! In C I would have needed to write a 127 parallel sort on my own with something like 128 [pthreads](https://en.wikipedia.org/wiki/Pthreads). 129 130 Lesson learned for a C developer: sometimes the extra complexity of 131 other languages does bring some benefit. 132 133 ## Gotchas 134 135 I mentioned above that I had some trouble compiling and running the 136 parallel version of my code. In fact it took me almost two hours to make 137 it work! This was for a couple of reasons. 138 139 The first reason is that, at least on Linux with GCC + GLIBC, the C++ 140 standard library requires 141 [TBB](https://en.wikipedia.org/wiki/Threading_Building_Blocks). I 142 figured this out rather early, and it did not take me long to learn 143 that I had to add an `-ltbb` option to my command line either. What 144 took me an incredibly long time to understand is that `-ltbb` had to 145 be put *at the end* of the command line options! To make it clear, 146 something like this: 147 148 ``` 149 $ g++ -O3 -std=c++20 -ltbb -o sort sort_parallel.cpp 150 ``` 151 152 does not work, you have to write 153 154 ``` 155 $ g++ -O3 -std=c++20 -o sort sort_parallel.cpp -ltbb 156 ``` 157 158 Another problem I had was caused by my use of macros. You can see 159 in the source files that I am using an `ARRAYSIZE` macro for the 160 size of the array, and I provide a value for it at compile time. 161 Originally I had called this macro `N`, but this clashed with some 162 internal name used in TBB, and I got all sorts of walls of text 163 of weird template errors - something C++ is infamous for. 164 165 The last issue I had was that I was stupid and tried to compile my 166 C++ code with a C compiler. Indeed, while debugging the issue with 167 `-ltbb` I tried switching to [clang](https://clang.llvm.org), because 168 I sometimes get better error messages with it. But instead of using 169 `clang++` I used `clang`, which only compiles C. 170 171 ## Until next time? 172 173 Now that I have broken the ice with C++ I think it will be easier 174 to continue studying it. I may or may not keep writing about this here: 175 turning this into another blog series may be a good way to force myself 176 to do this regularly, but on the other hand it may not be interesting 177 for my readers. We'll see! 178 179 *Next in the series: [RAII](../2024-12-26-taming-cpp-raii)*