learned-rewrite.md - sebastiano.tronto.net - Source files and build scripts for my personal website

learned-rewrite.md (19233B)
      1 # Things I learned rewriting a project from scratch
      2 
      3 Around two years ago I [wrote](../2023-04-10-the-big-rewrite) about a
      4 project I have been working on and off for a few years. In that post
      5 I explain how and why I decided to rewrite it from scracth, splitting
      6 it in multiple parts. I ended up not following exactly the plan that I
      7 laid out, and I also worked on the rewrite more slowly than I wanted.
      8 But rewriting from scratch allowed me to experiment with new ideas and
      9 discover new tricks, and after two years I learned enough stuff that I
     10 can make a blog post about it!
     11 
     12 The project I am talking about is [Nissy](https://nissy.tronto.net), a
     13 command-line Rubik's Cube solver that is quite popular in the very small
     14 niche of speedcubers interested in the *Fewest Moves Challenge*, which
     15 consists in solving the puzzle with the smallest amount of moves rather
     16 than as fast as possible. Both the old and the new versions are written
     17 in C, but not all the things I discuss in this post are language-specific.
     18 
     19 You can find the new version of this project, the "Big Rewrite",
     20 in [this repository](https://git.tronto.net/nissy-core) - or
     21 [on github](https://github.com/sebastianotronto/nissy-core), if you prefer.
     22 I decided to rename it "H48" back when I planned to have separate projects
     23 for the different features of the old version, but I may change it back
     24 to Nissy at some point.
     25 
     26 But now let's get into it!
     27 
     28 ## Better memory safety (yes, even in C)
     29 
     30 Let's start with a mistake that I think many inexperienced C programmers
     31 make: in the previous iterations of this project, I used a lot of
     32 `malloc()`s. And although I was quite diligent in always freeing the
     33 memory I used, in most cases I would have been better off not using
     34 dynamic memory allocation at all.
     35 
     36 As an example, let's consider the case of converting some data structure
     37 into a string. One way to do it would be something like this:
     38 
     39 ```
     40 char *mytype_to_string(const mytype_t *x) {
     41 	const size_t str_size = 512;
     42 	char *str = malloc(str_size);
     43 
     44 	/* Write stuff to str here */
     45 
     46 	return str;
     47 }
     48 ```
     49 
     50 If you are familiar with
     51 [garbage-collected languages](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)),
     52 you may find the code above nice and clean. However, since with C we have
     53 to manage our memory manually, the caller of this function would have
     54 to remember to `free()` the pointer returned by `my_type_to_string()`.
     55 This is not rocket science, but it is a common source of mistakes.
     56 
     57 However, a safer pattern for this kind of function would be the following:
     58 
     59 ```
     60 int mytype_to_string(const mytype_t *x, size_t len, char buffer[len]) {
     61 	const size_t str_size = 512;
     62 	if (len < str_size) {
     63 		/* Handle error in some way, for example by returning -1 */
     64 	}
     65 
     66 	/* Write stuff to buffer here */
     67 
     68 	return 0; /* Or return number of bytes written */
     69 }
     70 ```
     71 
     72 In this new version of the code, instead of returning a string, the
     73 function takes a `char` buffer and its lenght as a parameter. Notice
     74 that we are **not** passing the buffer by value, as the notation
     75 `buffer[len]` might suggest: this function's signature is equivalent to
     76 `my_type_to_string(const mytype_t *, size_t, char *)`, but by using the
     77 square-bracket notation the compiler may be able to catch a too-small
     78 buffer and warn us about it.
     79 
     80 Now we can get rid of the malloc entirely, because the caller can
     81 just allocate the char buffer on its stack:
     82 
     83 ```
     84 void doing_stuff(mytype_t x) {
     85 	const size_t len = 1024;
     86 	char str[len];
     87 	if (mytype_to_string(&x, len, str) != 0) {
     88 		/* Handle error */
     89 	}
     90 
     91 	/* ... */
     92 }
     93 ```
     94 
     95 Unfortuntely, this new pattern leads to a small problem: the caller
     96 must now be aware of the number of characters required for converting
     97 `mytype_t` to a string. There are a couple of ways to solve this problem,
     98 including handling a `NULL` first parameter as a "dry-run" request that
     99 returns the required size without writing anything, or using a global
    100 constant.
    101 
    102 This brings me to another feature I have started using more and more
    103 recently. Say we want to use a global constant for the size of `mytype_t`.
    104 We could do something like this:
    105 
    106 ```
    107 #define MYTYPE_SIZE 512
    108 
    109 int mytype_to_string(const mytype_t *x, char buffer[static MYTYPE_SIZE]) {
    110 	/* Write to buffer here, no error handling needed */
    111 
    112 	return 0; /* Or return number of bytes written */
    113 }
    114 
    115 void doing_stuff(mytype_t x) {
    116 	char str[MYTYPE_SIZE];
    117 	mytype_to_string(&x, str);
    118 
    119 	/* ... */
    120 }
    121 ```
    122 
    123 Notice how I used `[static MYTYPE_SIZE]` to denote the required size
    124 of the buffer parameter. This tells the compiler that the function
    125 `mytype_to_string()` can assume that the buffer must reference an area
    126 of memory of size at least `MYTYPE_SIZE`. In turn, the compiler can
    127 use this information not only to optimize the code, but also to give
    128 useful warnings.  This trick can also be used to tell the compiler that
    129 a pointer cannot be `NULL`: a pointer is not `NULL` if it refers to an
    130 valid area of memory of size at least 1. So we can declare a function
    131 as follows:
    132 
    133 ```
    134 int mytype_to_string(const mtytype_t x[static 1], char buffer[static MYTYPE_SIZE]) {
    135 	/* ... */
    136 }
    137 ```
    138 
    139 With this we can come up with a set of rules to never de-reference a
    140 NULL pointer.  A pointer must be either:
    141 
    142 1. Obtained by de-referencing an object allocated in current function.
    143 2. A parameter declared as `[static 1]`.
    144 3. Checked for `NULL` before de-referencing.
    145 
    146 Having more memory safety baked into the language would be more convenient,
    147 but at least we can use some mitigations.
    148 
    149 ## Sanitizers
    150 
    151 Another tool that help me improve the memory management of my code are
    152 [sanitizers](https://en.wikipedia.org/wiki/Code_sanitizer).
    153 They were suggested to me by [Tomas Rokicki](https://tomas.rokicki.com)
    154 when I shared my struggles with
    155 [an old bug](../2023-05-05-debug-smartphone), and now that I got used
    156 to them I am wondering how I even survived programming in C without them.
    157 Sanitizers are *compiler intrumentations* supported by the major C and
    158 C++ compilers - namely GCC and Clang. They add some runtime checks to
    159 a program that make it crash with a clear message if they detect
    160 a memory error, such as an access outside the bounds of an array, or a
    161 race condition. For example, say you try to access some out-of-bound
    162 element of an array:
    163 
    164 ```
    165 /* This is a.c */
    166 
    167 #include <stdio.h>
    168 
    169 int main() {
    170         int a[4] = {0, 1, 2, 3};
    171 
    172         printf("Out-of-bound value: %d\n", a[4]);
    173 }
    174 ```
    175 
    176 Normally, this program just runs fine and prints some garbage value:
    177 
    178 ```
    179 $ cc a.c && ./a.out
    180 Out-of-bound value: 1573265008
    181 ```
    182 
    183 Notice how the compiler did not even give us a warning!
    184 However, if we compile with the address sanitizer enabled:
    185 
    186 ```
    187 $ gcc -fsanitize=address a.c
    188 $ ./a.out
    189 =================================================================
    190 ==15152==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fc4e2500030 at pc 0x000000401331 bp 0x7ffdffd953c0 sp 0x7ffdffd953b8
    191 READ of size 4 at 0x7fc4e2500030 thread T0
    192     #0 0x401330 in main (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065)
    193     #1 0x7fc4e4a10247 in __libc_start_call_main (/lib64/libc.so.6+0x3247) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93)
    194     #2 0x7fc4e4a1030a in __libc_start_main_alias_1 (/lib64/libc.so.6+0x330a) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93)
    195     #3 0x4010d4 in _start (/home/sebastiano/a.out+0x4010d4) (BuildId: e8411facf942864956c817b9be7bee9868ec6065)
    196 
    197 Address 0x7fc4e2500030 is located in stack of thread T0 at offset 48 in frame
    198     #0 0x4011a5 in main (/home/sebastiano/a.out+0x4011a5) (BuildId: e8411facf942864956c817b9be7bee9868ec6065)
    199 
    200   This frame has 1 object(s):
    201     [32, 48) 'a' (line 4) <== Memory access at offset 48 overflows this variable
    202 HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
    203       (longjmp and C++ exceptions *are* supported)
    204 SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) in main
    205 Shadow bytes around the buggy address:
    206   0x7fc4e24ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    207   0x7fc4e24ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    208   0x7fc4e24ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    209   0x7fc4e24fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    210   0x7fc4e24fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    211 =>0x7fc4e2500000: f1 f1 f1 f1 00 00[f3]f3 00 00 00 00 00 00 00 00
    212   0x7fc4e2500080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    213   0x7fc4e2500100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    214   0x7fc4e2500180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    215   0x7fc4e2500200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    216   0x7fc4e2500280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    217 Shadow byte legend (one shadow byte represents 8 application bytes):
    218   Addressable:           00
    219   Partially addressable: 01 02 03 04 05 06 07 
    220   Heap left redzone:       fa
    221   Freed heap region:       fd
    222   Stack left redzone:      f1
    223   Stack mid redzone:       f2
    224   Stack right redzone:     f3
    225   Stack after return:      f5
    226   Stack use after scope:   f8
    227   Global redzone:          f9
    228   Global init order:       f6
    229   Poisoned by user:        f7
    230   Container overflow:      fc
    231   Array cookie:            ac
    232   Intra object redzone:    bb
    233   ASan internal:           fe
    234   Left alloca redzone:     ca
    235   Right alloca redzone:    cb
    236 ==15152==ABORTIN
    237 ```
    238 
    239 A bit overwhelming, but I certainly prefer this over getting random
    240 garbage results!
    241 
    242 Similarly, one can check for undefined behavior (`-fsanitize=undefined`),
    243 data races (`-fsanitize=thread`), memory leaks (`-fsanitize=leak`) and
    244 more. If you never used sanitizers, you should start now!
    245 
    246 Sanitizers is that they are *runtime* checks.  This means that your
    247 program running fine with sanitizers enabled does not guarantee the
    248 absence of errors. Moreover, a program compiled with sanitizers enabled
    249 will run much slower than normal - so you should only use them in your
    250 debug builds.
    251 
    252 It would be amazing to have this kind of checks performed at compile
    253 time - I wonder if this is this "Rust" thing all the cool kids are
    254 talking about?
    255 
    256 ## Data races and atomic types
    257 
    258 Before starting this project I knew the basics of multi-threaded
    259 programming with [pthreads](https://en.wikipedia.org/wiki/Pthreads). But
    260 I had (and probably still have) many things to learn.
    261 
    262 For example, I used to think that having a thread read a value that
    263 another thread is possibly writing at the same time could be
    264 done safely, without using any
    265 [lock](https://en.wikipedia.org/wiki/Lock_(computer_science)).
    266 
    267 Assuming you don't care if the value you read comes from before or after
    268 the write from the other thread, which was my case, this sounds like
    269 a reasonable assumption. It turns out that this assumption is called
    270 *sequential consistency*, and that it is almost always wrong.
    271 
    272 One way to confirm that a concurrent read + write is undefined
    273 behavior in C and C++ is compiling with `-fsanitize=thread`, which
    274 will rightfully scream at you if one such data race happens. Luckily,
    275 it is not necessary to use a mutex for this kind of operation:
    276 starting from C11 and C++11 you can just declare your variable
    277 [`_Atomic`](https://en.cppreference.com/w/c/language/atomic) and happily
    278 read and modify it at the same time. This comes at a performance cost,
    279 but this cost is smaller than that of a traditional lock.
    280 
    281 ## Single instruction, multiple data
    282 
    283 [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data)
    284 is a type of parallelism implemented in pretty much every CPU made in
    285 the last 20 years or so.  You can call SIMD instructions directly from
    286 high-level code using *vector intrinsics*. For example, the following
    287 code computes four sums of 64-bit integers at the same time:
    288 
    289 ```
    290 /* Compile with -mavx2 option */
    291 
    292 #include <stdio.h>
    293 #include <immintrin.h>
    294 
    295 int main() {
    296 	long long r[4];
    297 
    298 	/* Initialize the 4x 64-bit int data structures */
    299 	__m256i a = _mm256_set_epi64x(23, 42, 1234567890123456789, -1000);
    300 	__m256i b = _mm256_set_epi64x(-1, 69, -1234567890123456789, 3);
    301 
    302 	/* Compute the sums */
    303 	__m256i c = _mm256_add_epi64(a, b);
    304 
    305 	/* Store back the result into a buffer */
    306 	_mm256_storeu_si256((__m256i *)r, c);
    307 
    308 	/* Print the results (last is first) */
    309 	printf("Results:\n%lld\n%lld\n%lld\n%lld\n", r[0], r[1], r[2], r[3]);
    310 }
    311 ```
    312 
    313 The `__mm256_add_epi64()` used above is not just syntactic
    314 sugar: the compiler translates it directly to a single
    315 `vpaddq` assembly instruction. You can check this on
    316 [godbolt](https://godbolt.org/z/d6TjPzKPG), or by compiling the code
    317 above with `gcc -S -mavx2 file.c`.
    318 
    319 Adding numbers is not the only thing that can be done with SIMD: there
    320 are also instructions for other operations, including logic operators
    321 and permutations of blocks of bits. The latter was particularly relevant
    322 to me, this project being a Rubik's cube solver.
    323 
    324 Using these instructions gave me a similar feeling to when I wrote
    325 multi-threaded code for the first time: it looks scary at first
    326 because of the weird syntax, but it was actually pretty easy to do
    327 what I wanted once I got started.
    328 
    329 ## Testing
    330 
    331 It sounds completely trivial now, but I did not do much testing
    332 in my previous personal projects. But rewriting from scratch allowed
    333 me to do the right thing from the start and implement a proper
    334 testing suite.
    335 
    336 I used different types of tests at different stage of the development.
    337 In ealy stages, when I was building the core routines of the library,
    338 I relied more on [unit tests](https://en.wikipedia.org/wiki/Unit_testing),
    339 and even did some
    340 [test-driven development](https://en.wikipedia.org/wiki/Test-driven_development).
    341 These tests ran always in debug mode with *sanitizers* switched on (see
    342 above) and enabled me to catch errors early and be confident that, once
    343 the tests passed, the code was correct.  On later stages this strategy
    344 was hard to apply, because the high-level functions required working
    345 with large arrays of data that were multiple gigabytes in size; they
    346 could not be tested in isolation.
    347 
    348 Being completely nuts as I am, I set up the whole testing system
    349 from the ground up, but this turned out to be extremely practical and also
    350 quite powerful in its simplicity. In a folder called `test` I had a
    351 `test.sh` file and one sub-folder for each test. Each test consists
    352 of a (usually very short) `.c` file and one or more *test cases*,
    353 i.e. a pair of `file.in` (input file) and `file.out` (expected output)
    354 files. The `test.sh` script, which I call with `make test`, builds and
    355 runs each test case and compares the result to the expected output using
    356 diff(1). Optionally, one case use an environment variable to pick which
    357 test(s) to run.
    358 
    359 If you are wondering why I set this whole thing up and I did not use
    360 some test framework, that's a fair question. The answer is that I like to
    361 understand every part of something I build, and I have quite a distaste
    362 for external dependencies. Moreover, I was not in a rush to complete
    363 this project - I was happy to build it piece by piece and possibly go
    364 back and change everything if I discovered a better way to do things.
    365 
    366 By doing everything by hand I ended up re-discovering many well-known
    367 tricks on my own, such as [callback functions](../2024-06-20-callback-log)
    368 and [some macro magic](../2023-11-14-test-visibility-c-macro).
    369 
    370 But in the end it does not matter that much what type of tests or what
    371 framework one uses, as long as they are automated and easy to run.
    372 
    373 ## Structuring the project as a library
    374 
    375 This one is a rare instance of learning something not by banging my head
    376 against a wall until either cracks, but instead by making a good call
    377 intuitively early on and confirming it empirically.
    378 
    379 When I started the rewrite I had in the the back of my mind the idea
    380 of building multiple UIs for this tool, possibly using different
    381 languages and frameworks. So I decided to structure the core of the project
    382 as a library *with a very thin interface*. This meant not exposing many
    383 functions, types or constants to simplify writing adapter code for
    384 other languages. Indeed I avoided exposing *any* type at all and only
    385 used standard integer types and strings.  This made it easy to implement
    386 Python bindings (see below) and at the same time it gave me more freedom
    387 to reorganize the code internally without breaking the API.
    388 
    389 Another thing that I did was deferring all I/O and memory management
    390 to the *user* of the library. This implied for example using
    391 [callback functions](../2024-06-20-callback-log) for logging and,
    392 as discussed at the beginning of this post, returning string output via a
    393 `char *` buffer provided by the caller. This forced me to write in a very
    394 clean and safe style - there is only one `malloc()` in the core library!
    395 
    396 ## Interfacing with Python
    397 
    398 This project was developed as a library, but I still needed to run the
    399 code to test it, of course. To do this, at first I developed a simple
    400 command line tool to call this library's functions directly. However
    401 parsing command line options by hand can easily become messy.
    402 
    403 At some point I thought it would be cool to write some Python adapters
    404 for my library and use the Python REPL instead of my custom CLI. Not only
    405 would this be more powerful because of the ability to run arbitrary code,
    406 it would also be easier to write and maintain. It was something new for
    407 me, but once again it turned to be not too hard. I wrote about the first
    408 steps in [a blog post](..2024-10-08-python-c).
    409 
    410 ## Accepting contributors
    411 
    412 Around April last year a friend asked me if he could work on this
    413 project as part of his bachelor thesis in computer science. And I said
    414 no, because I wanted this to be my personal, slow-paced project and I
    415 could not assure him that it would be in a stable state while he was
    416 working on his thesis. So I made
    417 [a spin-off project](https://git.tronto.net/cubecore)
    418 and told him he could base his work on that. I offered to help if he
    419 had any questions on the topice, and that was it.
    420 
    421 Bur when he asked again to contribute a few months later, the situation
    422 had changed. The project was in a stable state and most of the preliminary
    423 work was done, so I decided to let him have fun and implement one of the
    424 core parts of the tool. I started reviewing his contributions, and I had
    425 to find the right balance between keeping the code consistent with what
    426 I had mind and let him submit his code without requesting unnecessary
    427 changes. This is something I regularly do for work, but doing this for
    428 a project of which I am the creator and main author felt much different!
    429 
    430 Besides the learning experience, this also had some technical benefits,
    431 mostly related to the fact that we are using different compilers,
    432 operating system and CPU architectures: now I have a library that compiles
    433 with both GCC and Clang, runs on Linux and MacOS, and is optimized for
    434 both x86 with
    435 [AVX2 extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2)
    436 and ARM with
    437 [NEON](https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)).
    438 
    439 ## Conclusion
    440 
    441 Rewriting from scratch is rarely the fastest way to fix a project,
    442 but it feels nice and, as this post shows, it is a great opportunity to
    443 learn cool new stuff.
	sebastiano.tronto.net Source files and build scripts for my personal website
	git clone https://git.tronto.net/sebastiano.tronto.net
	Download \| Log \| Files \| Refs \| README