taming-cpp-motivation.md (7211B)
1 # Taming C++, episode 1: motivation 2 3 C++ is pretty much the standard language in the software industry for 4 any project where performance matters. I have used it a fair bit in 5 the past - during my [IOI](https://ioinformatics.org) time and for 6 a university course - but I used it only as "C with 7 [STL](https://en.wikipedia.org/wiki/Standard_Template_Library)". I have 8 also reviewed it for my job interview a couple of years ago, but I have 9 not touched it since then. Given that I am at the early stages of my 10 career as a software developer and that I am interested in performance, 11 it's about time I learn modern C++! 12 13 ## Motivation 14 15 As much as I like programming and learning new stuff, this is something 16 that requires some effort. And it turns out that keeping a copy of 17 [A Tour of C++](https://www.stroustrup.com/tour3.html) on my desk and 18 looking at its cover from time to time does not provide sufficient 19 motivation to take up this new task. 20 21 Starting a new project would be a good way to start playing with a new 22 language, but at the moment I would rather keep working on my ongoing 23 projects than start a new one. Writing a (series of) blog post(s) 24 where I share my experience with the rest of the world seemed like a 25 good alternative, but I needed some extra push to get started. 26 27 Finally, a few weeks ago I have discovered Bert Hubert's series of posts 28 [*Modern C++ for C programmers*](https://berthub.eu/articles/posts/c++-1). 29 The first post in that series showed an example that got my attention: 30 for a task as simple as sorting a list of integers, using their respective 31 standard libraries, C++ outperforms C by a significant margin! 32 33 ## Performance 34 35 To get started with modern C++, I decided to repeat Bert's experiment. 36 You can find the code I wrote in 37 [this repository](https://git.tronto.net/taming-cpp). 38 39 To sort a list of a hundred million integers, in C we can use `qsort()` 40 from `stdlib.h` (see 41 [sort.c](https://git.tronto.net/taming-cpp/file/sort-benchmark/integers/sort.c.html)): 42 43 ``` 44 qsort(a, ARRAYSIZE, sizeof(int), compar); 45 ``` 46 47 Where `compar()` is a comparison function such as 48 49 ``` 50 int compar(const void *x, const void *y) { 51 return *(int *)x - *(int *)y; 52 } 53 ``` 54 55 In C++ we can use `sort()` from `algorithm` (see 56 [sort.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/integers/sort.cpp.html)): 57 58 ``` 59 std::sort(a, a+ARRAYSIZE, 60 [](const int &x, const int &y) { return x < y; }); 61 ``` 62 63 Here we use a 64 [lambda expression](https://en.cppreference.com/w/cpp/language/lambda) 65 instead of a comparison function. 66 67 Modern C++ also offers parallelized 68 version of common algorithms as part of the standard library, so if 69 we want to completely humiliate poor C we can use this (see 70 [sort_parallel.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/integers/sort_parallel.cpp.html)): 71 72 ``` 73 std::sort(std::execution::par, a, a+ARRAYSIZE, 74 [](const int &x, const int &y) { return x < y; }); 75 ``` 76 77 Parallelizing stuff like this looks almost like cheating, but it wasn't 78 without some struggle - see the "Gotchas" section below. 79 80 So, what is the result? These are the times I get on my 81 [Debian 12 desktop](https://sebastiano.tronto.net/blog/2023-10-15-build-time) 82 (AMD Ryzen 7 7700): 83 84 ``` 85 C time for 100000000 numbers: 9.815525s 86 C++ time for 100000000 numbers: 4.87299s 87 C++ time for 100000000 numbers (parallel): 0.488956s 88 ``` 89 90 Even without parallelization, C++ is twice as fast as C! 91 92 You might think that C++ is somehow optimizing for integer 93 sorting. But this is not the case, as demonstrated by a similar experiment 94 with *pairs* of integers (see 95 [sort.c](https://git.tronto.net/taming-cpp/file/sort-benchmark/pairs/sort.c.html), 96 [sort.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/pairs/sort.cpp.html) and 97 [sort_parallel.cpp](https://git.tronto.net/taming-cpp/file/sort-benchmark/pairs/sort_parallel.cpp.html) 98 - I used a non-standard mixed lexicographic order to be reasonably sure the 99 compiler does not come up with any ad-hoc optimization): 100 101 ``` 102 C time for 100000000 pairs: 12.383896s 103 C++ time for 100000000 pairs: 6.50033s 104 C++ time for 100000000 pairs (parallel): 0.713616s 105 ``` 106 107 ## Complexity is not for nothing 108 109 As explained by Bert in his post, the reason 110 the C++ version is faster is that by using 111 [templates](https://en.cppreference.com/w/cpp/language/templates) 112 the compiler can inline the call to the comparison function. C is not 113 as sophisticated, so this is not possible: the only way a standard 114 library function can call your custom code is via a 115 [function pointer](https://en.wikipedia.org/wiki/Function_pointer). 116 Regardless of possible performance issues with pointer 117 dereferencing, calling a function without inlining it always causes 118 some overhead. This is completely negligible for larger tasks, but for a 119 small function that is called millions of times it makes a big difference! 120 121 And all of this without even considering the elephant in the room, that is 122 the fact that with virtually zero extra effort - but again, see the "Gotchas" 123 section below - one can parallelize C++ STL algorithms and make them an order 124 of magnitude faster on modern hardware! In C I would have needed to write a 125 parallel sort on my own with something like 126 [pthreads](https://en.wikipedia.org/wiki/Pthreads). 127 128 Lesson learned for a C developer: sometimes the extra complexity of 129 other languages does bring some benefit. 130 131 ## Gotchas 132 133 I mentioned above that I had some trouble compiling and running the 134 parallel version of my code. In fact it took me almost two hours to make 135 it work! This was for a couple of reasons. 136 137 The first reason is that, at least on Linux with GCC + GLIBC, the C++ 138 standard library requires 139 [TBB](https://en.wikipedia.org/wiki/Threading_Building_Blocks). I 140 figured this out rather early, and it did not take me long to learn 141 that I had to add an `-ltbb` option to my command line either. What 142 took me an incredibly long time to understand is that `-ltbb` had to 143 be put *at the end* of the command line options! To make it clear, 144 something like this: 145 146 ``` 147 $ g++ -O3 -std=c++20 -ltbb -o sort sort_parallel.cpp 148 ``` 149 150 does not work, you have to write 151 152 ``` 153 $ g++ -O3 -std=c++20 -o sort sort_parallel.cpp -ltbb 154 ``` 155 156 Another problem I had was caused by my use of macros. You can see 157 in the source files that I am using an `ARRAYSIZE` macro for the 158 size of the array, and I provide a value for it at compile time. 159 Originally I had called this macro `N`, but this clashed with some 160 internal name used in TBB, and I got all sorts of walls of text 161 of weird template errors - something C++ is infamous for. 162 163 The last issue I had was that I was stupid and tried to compile my 164 C++ code with a C compiler. Indeed, while debugging the issue with 165 `-ltbb` I tried switching to [clang](https://clang.llvm.org), because 166 I sometimes get better error messages with it. But instead of using 167 `clang++` I used `clang`, which only compiles C. 168 169 ## Until next time? 170 171 Now that I have broken the ice with C++ I think it will be easier 172 to continue studying it. I may or may not keep writing about this here: 173 turning this into another blog series may be a good way to force myself 174 to do this regularly, but on the other hand it may not be interesting 175 for my readers. We'll see!