learned-rewrite.md (19219B)
1 # Things I learned rewriting a project from scratch 2 3 Around two years ago I [wrote](../2023-04-10-the-big-rewrite) about a 4 project I have been working on and off for a few years. In that post 5 I explain how and why I decided to rewrite it from scracth, splitting 6 it in multiple parts. I ended up not following exactly the plan that I 7 laid out, and I also worked on the rewrite more slowly than I wanted. 8 But rewriting from scratch allowed me to experiment with new ideas and 9 discover new tricks, and after two years I learned enough stuff that I 10 can make a blog post about it! 11 12 The project I am talking about is [Nissy](https://nissy.tronto.net), a 13 command-line Rubik's Cube solver that is quite popular in the very small 14 niche of speedcubers interested in the *Fewest Moves Challenge*, which 15 consists in solving the puzzle with the smallest amount of moves rather 16 than as fast as possible. Both the old and the new versions are written 17 in C, but not all the things I discuss in this post are language-specific. 18 19 You can find the new version of this project, the "Big Rewrite", 20 in [this repository](https://git.tronto.net/h48) - or 21 [on github](https://github.com/sebastianotronto/h48), if you prefer. 22 I decided to rename it "H48" back when I planned to have separate projects 23 for the different features of the old version, but I may change it back 24 to Nissy at some point. 25 26 But now let's get into it! 27 28 ## Better memory safety (yes, even in C) 29 30 Let's start with a mistake that I think many inexperienced C programmers 31 make: in the previous iterations of this project, I used a lot of 32 `malloc()`s. And although I was quite diligent in always freeing the 33 memory I used, in most cases I would have been better off not using 34 dynamic memory allocation at all. 35 36 As an example, let's consider the case of converting some data structure 37 into a string. One way to do it would be something like this: 38 39 ``` 40 char *mytype_to_string(const mytype_t *x) { 41 const size_t str_size = 512; 42 char *str = malloc(str_size); 43 44 /* Write stuff to str here */ 45 46 return str; 47 } 48 ``` 49 50 If you are familiar with 51 [garbage-collected languages](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)), 52 you may find the code above nice and clean. However, since with C we have 53 to manage our memory manually, the caller of this function would have 54 to remember to `free()` the pointer returned by `my_type_to_string()`. 55 This is not rocket science, but it is a common source of mistakes. 56 57 However, a safer pattern for this kind of function would be the following: 58 59 ``` 60 int mytype_to_string(const mytype_t *x, size_t len, char buffer[len]) { 61 const size_t str_size = 512; 62 if (len < str_size) { 63 /* Handle error in some way, for example by returning -1 */ 64 } 65 66 /* Write stuff to buffer here */ 67 68 return 0; /* Or return number of bytes written */ 69 } 70 ``` 71 72 In this new version of the code, instead of returning a string, the 73 function takes a `char` buffer and its lenght as a parameter. Notice 74 that we are **not** passing the buffer by value, as the notation 75 `buffer[len]` might suggest: this function's signature is equivalent to 76 `my_type_to_string(const mytype_t *, size_t, char *)`, but by using the 77 square-bracket notation the compiler may be able to catch a too-small 78 buffer and warn us about it. 79 80 Now we can get rid of the malloc entirely, because the caller can 81 just allocate the char buffer on its stack: 82 83 ``` 84 void doing_stuff(mytype_t x) { 85 const size_t len = 1024; 86 char str[len]; 87 if (mytype_to_string(&x, len, str) != 0) { 88 /* Handle error */ 89 } 90 91 /* ... */ 92 } 93 ``` 94 95 Unfortuntely, this new pattern leads to a small problem: the caller 96 must now be aware of the number of characters required for converting 97 `mytype_t` to a string. There are a couple of ways to solve this problem, 98 including handling a `NULL` first parameter as a "dry-run" request that 99 returns the required size without writing anything, or using a global 100 constant. 101 102 This brings me to another feature I have started using more and more 103 recently. Say we want to use a global constant for the size of `mytype_t`. 104 We could do something like this: 105 106 ``` 107 #define MYTYPE_SIZE 512 108 109 int mytype_to_string(const mytype_t *x, char buffer[static MYTYPE_SIZE]) { 110 /* Write to buffer here, no error handling needed */ 111 112 return 0; /* Or return number of bytes written */ 113 } 114 115 void doing_stuff(mytype_t x) { 116 char str[MYTYPE_SIZE]; 117 mytype_to_string(&x, str); 118 119 /* ... */ 120 } 121 ``` 122 123 Notice how I used `[static MYTYPE_SIZE]` to denote the required size 124 of the buffer parameter. This tells the compiler that the function 125 `mytype_to_string()` can assume that the buffer must reference an area 126 of memory of size at least `MYTYPE_SIZE`. In turn, the compiler can 127 use this information not only to optimize the code, but also to give 128 useful warnings. This trick can also be used to tell the compiler that 129 a pointer cannot be `NULL`: a pointer is not `NULL` if it refers to an 130 valid area of memory of size at least 1. So we can declare a function 131 as follows: 132 133 ``` 134 int mytype_to_string(const mtytype_t x[static 1], char buffer[static MYTYPE_SIZE]) { 135 /* ... */ 136 } 137 ``` 138 139 With this we can come up with a set of rules to never de-reference a 140 NULL pointer. A pointer must be either: 141 142 1. Obtained by de-referencing an object allocated in current function. 143 2. A parameter declared as `[static 1]`. 144 3. Checked for `NULL` before de-referencing. 145 146 Having more memory safety baked into the language would be more convenient, 147 but at least we can use some mitigations. 148 149 ## Sanitizers 150 151 Another tool that help me improve the memory management of my code are 152 [sanitizers](https://en.wikipedia.org/wiki/Code_sanitizer). 153 They were suggested to me by [Tomas Rokicki](https://tomas.rokicki.com) 154 when I shared my struggles with 155 [an old bug](../2023-05-05-debug-smartphone), and now that I got used 156 to them I am wondering how I even survived programming in C without them. 157 Sanitizers are *compiler intrumentations* supported by the major C and 158 C++ compilers - namely GCC and Clang. They add some runtime checks to 159 a program that make it crash with a clear message if they detect 160 a memory error, such as an access outside the bounds of an array, or a 161 race condition. For example, say you try to access some out-of-bound 162 element of an array: 163 164 ``` 165 /* This is a.c */ 166 167 #include <stdio.h> 168 169 int main() { 170 int a[4] = {0, 1, 2, 3}; 171 172 printf("Out-of-bound value: %d\n", a[4]); 173 } 174 ``` 175 176 Normally, this program just runs fine and prints some garbage value: 177 178 ``` 179 $ cc a.c && ./a.out 180 Out-of-bound value: 1573265008 181 ``` 182 183 Notice how the compiler did not even give us a warning! 184 However, if we compile with the address sanitizer enabled: 185 186 ``` 187 $ gcc -fsanitize=address a.c 188 $ ./a.out 189 ================================================================= 190 ==15152==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fc4e2500030 at pc 0x000000401331 bp 0x7ffdffd953c0 sp 0x7ffdffd953b8 191 READ of size 4 at 0x7fc4e2500030 thread T0 192 #0 0x401330 in main (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) 193 #1 0x7fc4e4a10247 in __libc_start_call_main (/lib64/libc.so.6+0x3247) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93) 194 #2 0x7fc4e4a1030a in __libc_start_main_alias_1 (/lib64/libc.so.6+0x330a) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93) 195 #3 0x4010d4 in _start (/home/sebastiano/a.out+0x4010d4) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) 196 197 Address 0x7fc4e2500030 is located in stack of thread T0 at offset 48 in frame 198 #0 0x4011a5 in main (/home/sebastiano/a.out+0x4011a5) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) 199 200 This frame has 1 object(s): 201 [32, 48) 'a' (line 4) <== Memory access at offset 48 overflows this variable 202 HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork 203 (longjmp and C++ exceptions *are* supported) 204 SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) in main 205 Shadow bytes around the buggy address: 206 0x7fc4e24ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 207 0x7fc4e24ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 208 0x7fc4e24ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 209 0x7fc4e24fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 210 0x7fc4e24fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 211 =>0x7fc4e2500000: f1 f1 f1 f1 00 00[f3]f3 00 00 00 00 00 00 00 00 212 0x7fc4e2500080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 213 0x7fc4e2500100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 214 0x7fc4e2500180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 215 0x7fc4e2500200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 216 0x7fc4e2500280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 217 Shadow byte legend (one shadow byte represents 8 application bytes): 218 Addressable: 00 219 Partially addressable: 01 02 03 04 05 06 07 220 Heap left redzone: fa 221 Freed heap region: fd 222 Stack left redzone: f1 223 Stack mid redzone: f2 224 Stack right redzone: f3 225 Stack after return: f5 226 Stack use after scope: f8 227 Global redzone: f9 228 Global init order: f6 229 Poisoned by user: f7 230 Container overflow: fc 231 Array cookie: ac 232 Intra object redzone: bb 233 ASan internal: fe 234 Left alloca redzone: ca 235 Right alloca redzone: cb 236 ==15152==ABORTIN 237 ``` 238 239 A bit overwhelming, but I certainly prefer this over getting random 240 garbage results! 241 242 Similarly, one can check for undefined behavior (`-fsanitize=undefined`), 243 data races (`-fsanitize=thread`), memory leaks (`-fsanitize=leak`) and 244 more. If you never used sanitizers, you should start now! 245 246 Sanitizers is that they are *runtime* checks. This means that your 247 program running fine with sanitizers enabled does not guarantee the 248 absence of errors. Moreover, a program compiled with sanitizers enabled 249 will run much slower than normal - so you should only use them in your 250 debug builds. 251 252 It would be amazing to have this kind of checks performed at compile 253 time - I wonder if this is this "Rust" thing all the cool kids are 254 talking about? 255 256 ## Data races and atomic types 257 258 Before starting this project I knew the basics of multi-threaded 259 programming with [pthreads](https://en.wikipedia.org/wiki/Pthreads). But 260 I had (and probably still have) many things to learn. 261 262 For example, I used to think that having a thread read a value that 263 another thread is possibly writing at the same time could be 264 done safely, without using any 265 [lock](https://en.wikipedia.org/wiki/Lock_(computer_science)). 266 267 Assuming you don't care if the value you read comes from before or after 268 the write from the other thread, which was my case, this sounds like 269 a reasonable assumption. It turns out that this assumption is called 270 *sequential consistency*, and that it is almost always wrong. 271 272 One way to confirm that a concurrent read + write is undefined 273 behavior in C and C++ is compiling with `-fsanitize=thread`, which 274 will rightfully scream at you if one such data race happens. Luckily, 275 it is not necessary to use a mutex for this kind of operation: 276 starting from C11 and C++11 you can just declare your variable 277 [`_Atomic`](https://en.cppreference.com/w/c/language/atomic) and happily 278 read and modify it at the same time. This comes at a performance cost, 279 but this cost is smaller than that of a traditional lock. 280 281 ## Single instruction, multiple data 282 283 [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) 284 is a type of parallelism implemented in pretty much every CPU made in 285 the last 20 years or so. You can call SIMD instructions directly from 286 high-level code using *vector intrinsics*. For example, the following 287 code computes four sums of 64-bit integers at the same time: 288 289 ``` 290 /* Compile with -mavx2 option */ 291 292 #include <stdio.h> 293 #include <immintrin.h> 294 295 int main() { 296 long long r[4]; 297 298 /* Initialize the 4x 64-bit int data structures */ 299 __m256i a = _mm256_set_epi64x(23, 42, 1234567890123456789, -1000); 300 __m256i b = _mm256_set_epi64x(-1, 69, -1234567890123456789, 3); 301 302 /* Compute the sums */ 303 __m256i c = _mm256_add_epi64(a, b); 304 305 /* Store back the result into a buffer */ 306 _mm256_storeu_si256((__m256i *)r, c); 307 308 /* Print the results (last is first) */ 309 printf("Results:\n%lld\n%lld\n%lld\n%lld\n", r[0], r[1], r[2], r[3]); 310 } 311 ``` 312 313 The `__mm256_add_epi64()` used above is not just syntactic 314 sugar: the compiler translates it directly to a single 315 `vpaddq` assembly instruction. You can check this on 316 [godbolt](https://godbolt.org/z/d6TjPzKPG), or by compiling the code 317 above with `gcc -S -mavx2 file.c`. 318 319 Adding numbers is not the only thing that can be done with SIMD: there 320 are also instructions for other operations, including logic operators 321 and permutations of blocks of bits. The latter was particularly relevant 322 to me, this project being a Rubik's cube solver. 323 324 Using these instructions gave me a similar feeling to when I wrote 325 multi-threaded code for the first time: it looks scary at first 326 because of the weird syntax, but it was actually pretty easy to do 327 what I wanted once I got started. 328 329 ## Testing 330 331 It sounds completely trivial now, but I did not do much testing 332 in my previous personal projects. But rewriting from scratch allowed 333 me to do the right thing from the start and implement a proper 334 testing suite. 335 336 I used different types of tests at different stage of the development. 337 In ealy stages, when I was building the core routines of the library, 338 I relied more on [unit tests](https://en.wikipedia.org/wiki/Unit_testing), 339 and even did some 340 [test-driven development](https://en.wikipedia.org/wiki/Test-driven_development). 341 These tests ran always in debug mode with *sanitizers* switched on (see 342 above) and enabled me to catch errors early and be confident that, once 343 the tests passed, the code was correct. On later stages this strategy 344 was hard to apply, because the high-level functions required working 345 with large arrays of data that were multiple gigabytes in size; they 346 could not be tested in isolation. 347 348 Being completely nuts as I am, I set up the whole testing system 349 from the ground up, but this turned out to be extremely practical and also 350 quite powerful in its simplicity. In a folder called `test` I had a 351 `test.sh` file and one sub-folder for each test. Each test consists 352 of a (usually very short) `.c` file and one or more *test cases*, 353 i.e. a pair of `file.in` (input file) and `file.out` (expected output) 354 files. The `test.sh` script, which I call with `make test`, builds and 355 runs each test case and compares the result to the expected output using 356 diff(1). Optionally, one case use an environment variable to pick which 357 test(s) to run. 358 359 If you are wondering why I set this whole thing up and I did not use 360 some test framework, that's a fair question. The answer is that I like to 361 understand every part of something I build, and I have quite a distaste 362 for external dependencies. Moreover, I was not in a rush to complete 363 this project - I was happy to build it piece by piece and possibly go 364 back and change everything if I discovered a better way to do things. 365 366 By doing everything by hand I ended up re-discovering many well-known 367 tricks on my own, such as [callback functions](../2024-06-20-callback-log) 368 and [some macro magic](../2023-11-14-test-visibility-c-macro). 369 370 But in the end it does not matter that much what type of tests or what 371 framework one uses, as long as they are automated and easy to run. 372 373 ## Structuring the project as a library 374 375 This one is a rare instance of learning something not by banging my head 376 against a wall until either cracks, but instead by making a good call 377 intuitively early on and confirming it empirically. 378 379 When I started the rewrite I had in the the back of my mind the idea 380 of building multiple UIs for this tool, possibly using different 381 languages and frameworks. So I decided to structure the core of the project 382 as a library *with a very thin interface*. This meant not exposing many 383 functions, types or constants to simplify writing adapter code for 384 other languages. Indeed I avoided exposing *any* type at all and only 385 used standard integer types and strings. This made it easy to implement 386 Python bindings (see below) and at the same time it gave me more freedom 387 to reorganize the code internally without breaking the API. 388 389 Another thing that I did was deferring all I/O and memory management 390 to the *user* of the library. This implied for example using 391 [callback functions](../2024-06-20-callback-log) for logging and, 392 as discussed at the beginning of this post, returning string output via a 393 `char *` buffer provided by the caller. This forced me to write in a very 394 clean and safe style - there is only one `malloc()` in the core library! 395 396 ## Interfacing with Python 397 398 This project was developed as a library, but I still needed to run the 399 code to test it, of course. To do this, at first I developed a simple 400 command line tool to call this library's functions directly. However 401 parsing command line options by hand can easily become messy. 402 403 At some point I thought it would be cool to write some Python adapters 404 for my library and use the Python REPL instead of my custom CLI. Not only 405 would this be more powerful because of the ability to run arbitrary code, 406 it would also be easier to write and maintain. It was something new for 407 me, but once again it turned to be not too hard. I wrote about the first 408 steps in [a blog post](..2024-10-08-python-c). 409 410 ## Accepting contributors 411 412 Around April last year a friend asked me if he could work on this 413 project as part of his bachelor thesis in computer science. And I said 414 no, because I wanted this to be my personal, slow-paced project and I 415 could not assure him that it would be in a stable state while he was 416 working on his thesis. So I made 417 [a spin-off project](https://git.tronto.net/cubecore) 418 and told him he could base his work on that. I offered to help if he 419 had any questions on the topice, and that was it. 420 421 Bur when he asked again to contribute a few months later, the situation 422 had changed. The project was in a stable state and most of the preliminary 423 work was done, so I decided to let him have fun and implement one of the 424 core parts of the tool. I started reviewing his contributions, and I had 425 to find the right balance between keeping the code consistent with what 426 I had mind and let him submit his code without requesting unnecessary 427 changes. This is something I regularly do for work, but doing this for 428 a project of which I am the creator and main author felt much different! 429 430 Besides the learning experience, this also had some technical benefits, 431 mostly related to the fact that we are using different compilers, 432 operating system and CPU architectures: now I have a library that compiles 433 with both GCC and Clang, runs on Linux and MacOS, and is optimized for 434 both x86 with 435 [AVX2 extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2) 436 and ARM with 437 [NEON](https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)). 438 439 ## Conclusion 440 441 Rewriting from scratch is rarely the fastest way to fix a project, 442 but it feels nice and, as this post shows, it is a great opportunity to 443 learn cool new stuff.