sebastiano.tronto.net

Source files and build scripts for my personal website
git clone https://git.tronto.net/sebastiano.tronto.net
Download | Log | Files | Refs | README

commit 6d6bbac675d4d511cbb981670761232dac40300a
parent 9ee63bff4c8c631e1cb69944ed49e64a9f91f94c
Author: Sebastiano Tronto <sebastiano@tronto.net>
Date:   Sat,  5 Apr 2025 11:08:47 +0200

Minor changes

Diffstat:
Asrc/blog/2025-04-05-learned-rewrite/learned-rewrite.md | 443+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dsrc/blog/2025-04-05-learnt-rewrite/learned-rewrite.md | 444-------------------------------------------------------------------------------
2 files changed, 443 insertions(+), 444 deletions(-)

diff --git a/src/blog/2025-04-05-learned-rewrite/learned-rewrite.md b/src/blog/2025-04-05-learned-rewrite/learned-rewrite.md @@ -0,0 +1,443 @@ +# Things I learned rewriting a project from scratch + +Around two years ago I [wrote](../2023-04-10-the-big-rewrite) about a +project I have been working on and off for a few years. In that post +I explain how and why I decided to rewrite it from scracth, splitting +it in multiple parts. I ended up not following exactly the plan that I +laid out, and I also worked on the rewrite more slowly than I wanted. +But rewriting from scratch allowed me to experiment with new ideas and +discover new tricks, and after two years I learned enough stuff that I +can make a blog post about it! + +The project I am talking about is [Nissy](https://nissy.tronto.net), a +command-line Rubik's Cube solver that is quite popular in the very small +niche of speedcubers interested in the *Fewest Moves Challenge*, which +consists in solving the puzzle with the smallest amount of moves rather +than as fast as possible. Both the old and the new versions are written +in C, but not all the things I discuss in this post are language-specific. + +You can find the new version of this project, the "Big Rewrite", +in [this repository](https://git.tronto.net/h48) - or +[on github](https://github.com/sebastianotronto/h48), if you prefer. +I decided to rename it "H48" back when I planned to have separate projects +for the different features of the old version, but I may change it back +to Nissy at some point. + +But now let's get into it! + +## Better memory safety (yes, even in C) + +Let's start with a mistake that I think many inexperienced C programmers +make: in the previous iterations of this project, I used a lot of +`malloc()`s. And although I was quite diligent in always freeing the +memory I used, in most cases I would have been better off not using +dynamic memory allocation at all. + +As an example, let's consider the case of converting some data structure +into a string. One way to do it would be something like this: + +``` +char *mytype_to_string(const mytype_t *x) { + const size_t str_size = 512; + char *str = malloc(str_size); + + /* Write stuff to str here */ + + return str; +} +``` + +If you are familiar with +[garbage-collected languages](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)), +you may find the code above nice and clean. However, since with C we have +to manage our memory manually, the caller of this function would have +to remember to `free()` the pointer returned by `my_type_to_string()`. +This is not rocket science, but it is a common source of mistakes. + +However, a safer pattern for this kind of function would be the following: + +``` +int mytype_to_string(const mytype_t *x, size_t len, char buffer[len]) { + const size_t str_size = 512; + if (len < str_size) { + /* Handle error in some way, for example by returning -1 */ + } + + /* Write stuff to buffer here */ + + return 0; /* Or return number of bytes written */ +} +``` + +In this new version of the code, instead of returning a string, the +function takes a `char` buffer and its lenght as a parameter. Notice +that we are **not** passing the buffer by value, as the notation +`buffer[len]` might suggest: this function's signature is equivalent to +`my_type_to_string(const mytype_t *, size_t, char *)`, but by using the +square-bracket notation the compiler may be able to catch a too-small +buffer and warn us about it. + +Now we can get rid of the malloc entirely, because the caller can +just allocate the char buffer on its stack: + +``` +void doing_stuff(mytype_t x) { + const size_t len = 1024; + char str[len]; + if (mytype_to_string(&x, len, str) != 0) { + /* Handle error */ + } + + /* ... */ +} +``` + +Unfortuntely, this new pattern leads to a small problem: the caller +must now be aware of the number of characters required for converting +`mytype_t` to a string. There are a couple of ways to solve this problem, +including handling a `NULL` first parameter as a "dry-run" request that +returns the required size without writing anything, or using a global +constant. + +This brings me to another feature I have started using more and more +recently. Say we want to use a global constant for the size of `mytype_t`. +We could do something like this: + +``` +#define MYTYPE_SIZE 512 + +int mytype_to_string(const mytype_t *x, char buffer[static MYTYPE_SIZE]) { + /* Write to buffer here, no error handling needed */ + + return 0; /* Or return number of bytes written */ +} + +void doing_stuff(mytype_t x) { + char str[MYTYPE_SIZE]; + mytype_to_string(&x, str); + + /* ... */ +} +``` + +Notice how I used `[static MYTYPE_SIZE]` to denote the required size +of the buffer parameter. This tells the compiler that the function +`mytype_to_string()` can assume that the buffer must reference an area +of memory of size at least `MYTYPE_SIZE`. In turn, the compiler can +use this information not only to optimize the code, but also to give +useful warnings. This trick can also be used to tell the compiler that +a pointer cannot be `NULL`: a pointer is not `NULL` if it refers to an +valid area of memory of size at least 1. So we can declare a function +as follows: + +``` +int mytype_to_string(const mtytype_t x[static 1], char buffer[static MYTYPE_SIZE]) { + /* ... */ +} +``` + +With this we can come up with a set of rules to never de-reference a +NULL pointer. A pointer must be either: + +1. Obtained by de-referencing an object allocated in current function. +2. A parameter declared as `[static 1]`. +3. Checked for `NULL` before de-referencing. + +Having more memory safety baked into the language would be more convenient, +but at least we can use some mitigations. + +## Sanitizers + +Another tool that help me improve the memory management of my code are +[sanitizers](https://en.wikipedia.org/wiki/Code_sanitizer). +They were suggested to me by [Tomas Rokicki](https://tomas.rokicki.com) +when I shared my struggles with +[an old bug](../2023-05-05-debug-smartphone), and now that I got used +to them I am wondering how I even survived programming in C without them. +Sanitizers are *compiler intrumentations* supported by the major C and +C++ compilers - namely GCC and Clang. They add some runtime checks to +a program that make it crash with a clear message if they detect +a memory error, such as an access outside the bounds of an array, or a +race condition. For example, say you try to access some out-of-bound +element of an array: + +``` +/* This is a.c */ + +#include <stdio.h> + +int main() { + int a[4] = {0, 1, 2, 3}; + + printf("Out-of-bound value: %d\n", a[4]); +} +``` + +Normally, this program just runs fine and prints some garbage value: + +``` +$ cc a.c && ./a.out +Out-of-bound value: 1573265008 +``` + +Notice how the compiler did not even give us a warning! +However, if we compile with the address sanitizer enabled: + +``` +$ gcc -fsanitize=address a.c +$ ./a.out +================================================================= +==15152==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fc4e2500030 at pc 0x000000401331 bp 0x7ffdffd953c0 sp 0x7ffdffd953b8 +READ of size 4 at 0x7fc4e2500030 thread T0 + #0 0x401330 in main (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) + #1 0x7fc4e4a10247 in __libc_start_call_main (/lib64/libc.so.6+0x3247) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93) + #2 0x7fc4e4a1030a in __libc_start_main_alias_1 (/lib64/libc.so.6+0x330a) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93) + #3 0x4010d4 in _start (/home/sebastiano/a.out+0x4010d4) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) + +Address 0x7fc4e2500030 is located in stack of thread T0 at offset 48 in frame + #0 0x4011a5 in main (/home/sebastiano/a.out+0x4011a5) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) + + This frame has 1 object(s): + [32, 48) 'a' (line 4) <== Memory access at offset 48 overflows this variable +HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork + (longjmp and C++ exceptions *are* supported) +SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) in main +Shadow bytes around the buggy address: + 0x7fc4e24ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e24ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e24ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e24fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e24fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +=>0x7fc4e2500000: f1 f1 f1 f1 00 00[f3]f3 00 00 00 00 00 00 00 00 + 0x7fc4e2500080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e2500100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e2500180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e2500200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 0x7fc4e2500280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +Shadow byte legend (one shadow byte represents 8 application bytes): + Addressable: 00 + Partially addressable: 01 02 03 04 05 06 07 + Heap left redzone: fa + Freed heap region: fd + Stack left redzone: f1 + Stack mid redzone: f2 + Stack right redzone: f3 + Stack after return: f5 + Stack use after scope: f8 + Global redzone: f9 + Global init order: f6 + Poisoned by user: f7 + Container overflow: fc + Array cookie: ac + Intra object redzone: bb + ASan internal: fe + Left alloca redzone: ca + Right alloca redzone: cb +==15152==ABORTIN +``` + +A bit overwhelming, but I certainly prefer this over getting random +garbage results! + +Similarly, one can check for undefined behavior (`-fsanitize=undefined`), +data races (`-fsanitize=thread`), memory leaks (`-fsanitize=leak`) and +more. If you never used sanitizers, you should start now! + +Sanitizers is that they are *runtime* checks. This means that your +program running fine with sanitizers enabled does not guarantee the +absence of errors. Moreover, a program compiled with sanitizers enabled +will run much slower than normal - so you should only use them in your +debug builds. + +It would be amazing to have this kind of checks performed at compile +time - I wonder if this is this "Rust" thing all the cool kids are +talking about? + +## Data races and atomic types + +Before starting this project I knew the basics of multi-threaded +programming with [pthreads](https://en.wikipedia.org/wiki/Pthreads). But +I had (and probably still have) many things to learn. + +For example, I used to think that having a thread read a value that +another thread is possibly writing at the same time could be +done safely, without using any +[lock](https://en.wikipedia.org/wiki/Lock_(computer_science)). + +Assuming you don't care if the value you read comes from before or after +the write from the other thread, which was my case, this sounds like +a reasonable assumption. It turns out that this assumption is called +*sequential consistency*, and that it is almost always wrong. + +One way to confirm that a concurrent read + write is undefined +behavior in C and C++ is compiling with `-fsanitize=thread`, which +will rightfully scream at you if one such data race happens. Luckily, +it is not necessary to use a mutex for this kind of operation: +starting from C11 and C++11 you can just declare your variable +[`_Atomic`](https://en.cppreference.com/w/c/language/atomic) and happily +read and modify it at the same time. This comes at a performance cost, +but this cost is smaller than that of a traditional lock. + +## Single instruction, multiple data + +[SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) +is a type of parallelism implemented in pretty much every CPU made in +the last 20 years or so. You can call SIMD instructions directly from +high-level code using *vector intrinsics*. For example, the following +code computes four sums of 64-bit integers at the same time: + +``` +/* Compile with -mavx2 option */ + +#include <stdio.h> +#include <immintrin.h> + +int main() { + long long r[4]; + + /* Initialize the 4x 64-bit int data structures */ + __m256i a = _mm256_set_epi64x(23, 42, 1234567890123456789, -1000); + __m256i b = _mm256_set_epi64x(-1, 69, -1234567890123456789, 3); + + /* Compute the sums */ + __m256i c = _mm256_add_epi64(a, b); + + /* Store back the result into a buffer */ + _mm256_storeu_si256((__m256i *)r, c); + + /* Print the results (last is first) */ + printf("Results:\n%lld\n%lld\n%lld\n%lld\n", r[0], r[1], r[2], r[3]); +} +``` + +The `__mm256_add_epi64()` used above is not just syntactic +sugar: the compiler translates it directly to a single +`vpaddq` assembly instruction. You can check this on +[godbolt](https://godbolt.org/z/d6TjPzKPG), or by compiling the code +above with `gcc -S -mavx2 file.c`. + +Adding numbers is not the only thing that can be done with SIMD: there +are also instructions for other operations, including logic operators +and permutations of blocks of bits. The latter was particularly relevant +to me, this project being a Rubik's cube solver. + +Using these instructions gave me a similar feeling to when I wrote +multi-threaded code for the first time: it looks scary at first +because of the weird syntax, but it was actually pretty easy to do +what I wanted once I got started. + +## Testing + +It sounds completely trivial now, but I did not do much testing +in my previous personal projects. But rewriting from scratch allowed +me to do the right thing from the start and implement a proper +testing suite. + +I used different types of tests at different stage of the development. +In ealy stages, when I was building the core routines of the library, +I relied more on [unit tests](https://en.wikipedia.org/wiki/Unit_testing), +and even did some +[test-driven development](https://en.wikipedia.org/wiki/Test-driven_development). +These tests ran always in debug mode with *sanitizers* switched on (see +above) and enabled me to catch errors early and be confident that, once +the tests passed, the code was correct. On later stages this strategy +was hard to apply, because the high-level functions required working +with large arrays of data that were multiple gigabytes in size; they +could not be tested in isolation. + +Being completely nuts as I am, I set up the whole testing system +from the ground up, but this turned out to be extremely practical and also +quite powerful in its simplicity. In a folder called `test` I had a +`test.sh` file and one sub-folder for each test. Each test consists +of a (usually very short) `.c` file and one or more *test cases*, +i.e. a pair of `file.in` (input file) and `file.out` (expected output) +files. The `test.sh` script, which I call with `make test`, builds and +runs each test case and compares the result to the expected output using +diff(1). Optionally, one case use an environment variable to pick which +test(s) to run. + +If you are wondering why I set this whole thing up and I did not use +some test framework, that's a fair question. The answer is that I like to +understand every part of something I build, and I have quite a distaste +for external dependencies. Moreover, I was not in a rush to complete +this project - I was happy to build it piece by piece and possibly go +back and change everything if I discovered a better way to do things. + +By doing everything by hand I ended up re-discovering many well-known +tricks on my own, such as [callback functions](../2024-06-20-callback-log) +and [some macro magic](../2023-11-14-test-visibility-c-macro). + +But in the end it does not matter that much what type of tests or what +framework one uses, as long as they are automated and easy to run. + +## Structuring the project as a library + +This one is a rare instance of learning something not by banging my head +against a wall until either cracks, but instead by making a good call +intuitively early on and confirming it empirically. + +When I started the rewrite I had in the the back of my mind the idea +of building multiple UIs for this tool, possibly using different +languages and frameworks. So I decided to structure the core of the project +as a library *with a very thin interface*. This meant not exposing many +functions, types or constants to simplify writing adapter code for +other languages. Indeed I avoided exposing *any* type at all and only +used standard integer types and strings. This made it easy to implement +Python bindings (see below) and at the same time it gave me more freedom +to reorganize the code internally without breaking the API. + +Another thing that I did was deferring all I/O and memory management +to the *user* of the library. This implied for example using +[callback functions](../2024-06-20-callback-log) for logging and, +as discussed at the beginning of this post, returning string output via a +`char *` buffer provided by the caller. This forced me to write in a very +clean and safe style - there is only one `malloc()` in the core library! + +## Interfacing with Python + +This project was developed as a library, but I still needed to run the +code to test it, of course. To do this, at first I developed a simple +command line tool to call this library's functions directly. However +parsing command line options by hand can easily become messy. + +At some point I thought it would be cool to write some Python adapters +for my library and use the Python REPL instead of my custom CLI. Not only +would this be more powerful because of the ability to run arbitrary code, +it would also be easier to write and maintain. It was something new for +me, but once again it turned to be not too hard. I wrote about the first +steps in [a blog post](..2024-10-08-python-c). + +## Accepting contributors + +Around April last year a friend asked me if he could work on this +project as part of his bachelor thesis in computer science. And I said +no, because I wanted this to be my personal, slow-paced project and I +could not assure him that it would be in a stable state while he was +working on his thesis. So I made +[a spin-off project](https://git.tronto.net/cubecore) +and told him he could base his work on that. I offered to help if he +had any questions on the topice, and that was it. + +Bur when he asked again to contribute a few months later, the situation +had changed. The project was in a stable state and most of the preliminary +work was done, so I decided to let him have fun and implement one of the +core parts of the tool. I started reviewing his contributions, and I had +to find the right balance between keeping the code consistent with what +I had mind and let him submit his code without requesting unnecessary +changes. This is something I regularly do for work, but doing this for +a project of which I am the creator and main author felt much different! + +Besides the learning experience, this also had some technical benefits, +mostly related to the fact that we are using different compilers, +operating system and CPU architectures: now I have a library that compiles +with both GCC and Clang, runs on Linux and MacOS, and is optimized for +both x86 with +[AVX2 extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2) +and ARM with +[NEON](https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)). + +## Conclusion + +Rewriting from scratch is rarely the fastest way to fix a project, +but it feels nice and, as this post shows, it is a great opportunity to +learn cool new stuff. diff --git a/src/blog/2025-04-05-learnt-rewrite/learned-rewrite.md b/src/blog/2025-04-05-learnt-rewrite/learned-rewrite.md @@ -1,444 +0,0 @@ -# Things I learned rewriting a project from scratch - -About two years ago I [wrote](../2023-04-10-the-big-rewrite) about a -project I have been working on and off for a few years. In that post -I explain how and why I decided to rewrite it from scracth, splitting -it in multiple parts. I ended up not following exactly the plan that I -laid out, and I also worked on the rewrite more slowly than I wanted. -But rewriting from scratch allowed me to experiment with new ideas and -discover new tricks, and after two years I learned enough stuff that I -can make a blog post about it! - -The project I am talking about is [Nissy](https://nissy.tronto.net), a -command-line Rubik's Cube solver that is quite popular in the very small -niche of speedcubers interested in the *Fewest Moves Challenge*, which -consists in solving the puzzle with the smallest amount of moves rather -than as fast as possible. Both the old and the new versions are written -in C, but not all the things I discuss in this post are language-specific. - -You can find the new version of this project, the "Big Rewrite", -in [this repository](https://git.tronto.net/h48) - or -[on github](https://github.com/sebastianotronto/h48), if you prefer. -I decided to rename it "H48" back when I planned to have separate projects -for the different features of the old version, but I may change it back -to Nissy at some point. - -But now let's get into it! - -## Better memory safety (yes, even in C) - -Let's start with a mistake that I think many inexperienced C programmers -make: in the previous iterations of this project, I used a lot of -`malloc()`s. And although I was quite diligent in always freeing the -memory I used, in most cases I would have been better off not using -dynamic memory allocation at all. - -As an example, let's consider the case of converting some data structure -into a string. One way to do it would be something like this: - -``` -char *mytype_to_string(const mytype_t *x) { - const size_t str_size = 512; - char *str = malloc(str_size); - - /* Write stuff to str here */ - - return str; -} -``` - -If you are familiar with -[garbage-collected languages](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)), -you may find the code above nice and clean. However, since with C we have -to manage our memory manually, the caller of this function would have -to remember to `free()` the pointer returned by `my_type_to_string()`. -This is not rocket science, but it is a common source of mistakes. - -However, a safer pattern for this kind of function would be the following: - -``` -int mytype_to_string(const mytype_t *x, size_t len, char buffer[len]) { - const size_t str_size = 512; - if (len < str_size) { - /* Handle error in some way, for example by returning -1 */ - } - - /* Write stuff to buffer here */ - - return 0; /* Or return number of bytes written */ -} -``` - -In this new version of the code, instead of returning a string, the -function takes a `char` buffer and its lenght as a parameter. Notice -that we are **not** passing the buffer by value, as the notation -`buffer[len]` might suggest: this function's signature is equivalent to -`my_type_to_string(const mytype_t *, size_t, char *)`, but by using the -square-bracket notation the compiler may be able to catch a too-small -buffer and warn us about it. - -Now we can get rid of the malloc entirely, because the caller can -just allocate the char buffer on its stack: - -``` -void doing_stuff(mytype_t x) { - const size_t len = 1024; - char str[len]; - if (mytype_to_string(&x, len, str) != 0) { - /* Handle error */ - } - - /* ... */ -} -``` - -Unfortuntely, this new pattern leads to a small problem: the caller -must now be aware of the number of characters required for converting -`mytype_t` to a string. There are a couple of ways to solve this problem, -including handling a `NULL` first parameter as a "dry-run" request that -returns the required size without writing anything, or using a global -constant. - -This brings me to another feature I have started using more and more -recently. Say we want to use a global constant for the size of `mytype_t`. -We could do something like this: - -``` -#define MYTYPE_SIZE 512 - -int mytype_to_string(const mytype_t *x, char buffer[static MYTYPE_SIZE]) { - /* Write to buffer here, no error handling needed */ - - return 0; /* Or return number of bytes written */ -} - -void doing_stuff(mytype_t x) { - char str[MYTYPE_SIZE]; - mytype_to_string(&x, str); - - /* ... */ -} -``` - -Notice how I used `[static MYTYPE_SIZE]` to denote the required size -of the buffer parameter. This tells the compiler that the function -`mytype_to_string()` can assume that the buffer must reference an area -of memory of size at least `MYTYPE_SIZE`. In turn, the compiler can -use this information not only to optimize the code, but also to give -useful warnings. This trick can also be used to tell the compiler that -a pointer cannot be `NULL`: a pointer is not `NULL` if it refers to an -valid area of memory of size at least 1. So we can declare a function -as follows: - -``` -int mytype_to_string(const mtytype_t x[static 1], char buffer[static MYTYPE_SIZE]) { - /* ... */ -} -``` - -With this we can come up with a set of rules to never de-reference a -NULL pointer. A pointer must be either: - -1. Obtained by de-referencing an object allocated in current function. -2. A parameter declared as `[static 1]`. -3. Checked for `NULL` before de-referencing. - -Having more memory safety baked into the language would be more convenient, -but at least we can use some mitigations. - -## Sanitizers - -Another tool that help me improve the memory management of my code are -[sanitizers](https://en.wikipedia.org/wiki/Code_sanitizer). -They were suggested to me by [Tomas Rokicki](https://tomas.rokicki.com) -when I shared my struggles with -[an old bug](../2023-05-05-debug-smartphone), and now that I got used -to them I am wondering how I even survived programming in C without them. -Sanitizers are *compiler intrumentations* supported by the major C and -C++ compilers - namely GCC and Clang. They add some runtime checks to -a program that make it crash with a clear message if they detect -a memory error, such as an access outside the bounds of an array, or a -race condition. For example, say you try to access some out-of-bound -element of an array: - -``` -/* This is a.c */ - -#include <stdio.h> - -int main() { - int a[4] = {0, 1, 2, 3}; - - printf("Out-of-bound value: %d\n", a[4]); -} -``` - -Normally, this program just runs fine and prints some garbage value: - -``` -$ cc a.c && ./a.out -Out-of-bound value: 1573265008 -``` - -Notice how the compiler did not even give us a warning! -However, if we compile with the address sanitizer enabled: - -``` -$ gcc -fsanitize=address a.c -$ ./a.out -================================================================= -==15152==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fc4e2500030 at pc 0x000000401331 bp 0x7ffdffd953c0 sp 0x7ffdffd953b8 -READ of size 4 at 0x7fc4e2500030 thread T0 - #0 0x401330 in main (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) - #1 0x7fc4e4a10247 in __libc_start_call_main (/lib64/libc.so.6+0x3247) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93) - #2 0x7fc4e4a1030a in __libc_start_main_alias_1 (/lib64/libc.so.6+0x330a) (BuildId: f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93) - #3 0x4010d4 in _start (/home/sebastiano/a.out+0x4010d4) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) - -Address 0x7fc4e2500030 is located in stack of thread T0 at offset 48 in frame - #0 0x4011a5 in main (/home/sebastiano/a.out+0x4011a5) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) - - This frame has 1 object(s): - [32, 48) 'a' (line 4) <== Memory access at offset 48 overflows this variable -HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork - (longjmp and C++ exceptions *are* supported) -SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/sebastiano/a.out+0x401330) (BuildId: e8411facf942864956c817b9be7bee9868ec6065) in main -Shadow bytes around the buggy address: - 0x7fc4e24ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e24ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e24ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e24fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e24fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -=>0x7fc4e2500000: f1 f1 f1 f1 00 00[f3]f3 00 00 00 00 00 00 00 00 - 0x7fc4e2500080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e2500100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e2500180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e2500200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - 0x7fc4e2500280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -Shadow byte legend (one shadow byte represents 8 application bytes): - Addressable: 00 - Partially addressable: 01 02 03 04 05 06 07 - Heap left redzone: fa - Freed heap region: fd - Stack left redzone: f1 - Stack mid redzone: f2 - Stack right redzone: f3 - Stack after return: f5 - Stack use after scope: f8 - Global redzone: f9 - Global init order: f6 - Poisoned by user: f7 - Container overflow: fc - Array cookie: ac - Intra object redzone: bb - ASan internal: fe - Left alloca redzone: ca - Right alloca redzone: cb -==15152==ABORTIN -``` - -A bit overwhelming, but I certainly prefer this over getting random -garbage results! - -Similarly, one can check for undefined behavior (`-fsanitize=undefined`), -data races (`-fsanitize=thread`), memory leaks (`-fsanitize=leak`) and -more. If you never used sanitizers, you should start now! - -Sanitizers is that they are *runtime* checks. This means that your -program running fine with sanitizers enabled does not guarantee the -absence of errors. Moreover, a program compiled with sanitizers enabled -will run much slower than normal - so you should only use them in your -debug builds. - -It would be amazing to have this kind of checks performed at compile -time - I wonder if this is this "Rust" thing all the cool kids are -talking about? - -## Data races and atomic types - -Before starting this project I knew the basics of multi-threaded -programming with [pthreads](https://en.wikipedia.org/wiki/Pthreads). But -I had (and probably still have) many things to learn. - -For example, I used to think that having a thread read a value that -another thread is possibly writing at the same time could be -done safely, without using any -[lock](https://en.wikipedia.org/wiki/Lock_(computer_science)). - -Assuming you don't care if the value you read comes from before or after -the write from the other thread, which was my case, this sounds like -a reasonable assumption. It turns out that this assumption is called -*sequential consistency*, and that it is almost always wrong. - -One way to confirm that a concurrent read + write is undefined -behavior in C and C++ is compiling with `-fsanitize=thread`, which -will rightfully scream at you if one such data race happens. Luckily, -it is not necessary to use a mutex for this kind of operation: -starting from C11 and C++11 you can just declare your variable -[`_Atomic`](https://en.cppreference.com/w/c/language/atomic) and happily -read and modify it at the same time. This comes at a performance cost, -but this cost is smaller than that of a traditional lock. - -## Single instruction, multiple data - -[SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) -is a type of parallelism implemented in pretty much every CPU made in -the last 20 years or so. You can call SIMD instructions directly from -high-level code using *vector intrinsics*. For example, the following -code computes four sums of 64-bit integers at the same time: - -``` -/* Compile with -mavx2 option */ - -#include <stdio.h> -#include <immintrin.h> - -int main() { - long long r[4]; - - /* Initialize the 4x 64-bit int data structures */ - __m256i a = _mm256_set_epi64x(23, 42, 1234567890123456789, -1000); - __m256i b = _mm256_set_epi64x(-1, 69, -1234567890123456789, 3); - - /* Compute the sums */ - __m256i c = _mm256_add_epi64(a, b); - - /* Store back the result into a buffer */ - _mm256_storeu_si256((__m256i *)r, c); - - /* Print the results (last is first) */ - printf("Results:\n%lld\n%lld\n%lld\n%lld\n", r[0], r[1], r[2], r[3]); -} - -``` - -The `__mm256_add_epi64()` used above is not just syntactic -sugar: the compiler translates it directly to a single -`vpaddq` assembly instruction. You can check this on -[godbolt](https://godbolt.org/z/d6TjPzKPG), or by compiling the code -above with `gcc -S -mavx2 file.c`. - -Adding numbers is not the only thing that can be done with SIMD: there -are also instructions for other operations, including logic operators -and permutations of blocks of bits. The latter was particularly relevant -to me, this project being a Rubik's cube solver. - -Using these instructions gave me a similar feeling to when I wrote -multi-threaded code for the first time: it looks scary at first -because of the weird syntax, but it was actually pretty easy to do -what I wanted once I got started. - -## Testing - -It sounds completely trivial now, but I did not do much testing -in my previous personal projects. But rewriting from scratch allowed -me to do the right thing from the start and implement a proper -testing suite. - -I used different types of tests at different stage of the development. -In ealy stages, when I was building the core routines of the library, -I relied more on [unit tests](https://en.wikipedia.org/wiki/Unit_testing), -and even did some -[test-driven development](https://en.wikipedia.org/wiki/Test-driven_development). -These tests ran always in debug mode with *sanitizers* switched on (see -above) and enabled me to catch errors early and be confident that, once -the tests passed, the code was correct. On later stages this strategy -was hard to apply, because the high-level functions required working -with large arrays of data that were multiple gigabytes in size; they -could not be tested in isolation. - -Being completely nuts as I am, I set up the whole testing system -from the ground up, but this turned out to be extremely practical and also -quite powerful in its simplicity. In a folder called `test` I had a -`test.sh` file and one sub-folder for each test. Each test consists -of a (usually very short) `.c` file and one or more *test cases*, -i.e. a pair of `file.in` (input file) and `file.out` (expected output) -files. The `test.sh` script, which I call with `make test`, builds and -runs each test case and compares the result to the expected output using -diff(1). Optionally, one case use an environment variable to pick which -test(s) to run. - -If you are wondering why I set this whole thing up and I did not use -some test framework, that's a fair question. The answer is that I like to -understand every part of something I build, and I have quite a distaste -for external dependencies. Moreover, I was not in a rush to complete -this project - I was happy to build it piece by piece and possibly go -back and change everything if I discovered a better way to do things. - -By doing everything by hand I ended up re-discovering many well-known -tricks on my own, such as [callback functions](../2024-06-20-callback-log) -and [some macro magic](../2023-11-14-test-visibility-c-macro). - -But in the end it does not matter that much what type of tests or what -framework one uses, as long as they are automated and easy to run. - -## Structuring the project as a library - -This one is a rare instance of learning something not by banging my head -against a wall until either cracks, but instead by making a good call -intuitively early on and confirming it empirically. - -When I started the rewrite I had in the the back of my mind the idea -of building multiple UIs for this tool, possibly using different -languages and frameworks. So I decided to structure the core of the project -as a library *with a very thin interface*. This meant not exposing many -functions, types or constants to simplify writing adapter code for -other languages. Indeed I avoided exposing *any* type at all and only -used standard integer types and strings. This made it easy to implement -Python bindings (see below) and at the same time it gave me more freedom -to reorganize the code internally without breaking the API. - -Another thing that I did was deferring all I/O and memory management -to the *user* of the library. This implied for example using -[callback functions](../2024-06-20-callback-log) for logging and, -as discussed at the beginning of this post, returning string output via a -`char *` buffer provided by the caller. This forced me to write in a very -clean and safe style - there is only one `malloc()` in the core library! - -## Interfacing with Python - -This project was developed as a library, but I still needed to run the -code to test it, of course. To do this, at first I developed a simple -command line tool to call this library's functions directly. However -parsing command line options by hand can easily become messy. - -At some point I thought it would be cool to write some Python adapters -for my library and use the Python REPL instead of my custom CLI. Not only -would this be more powerful because of the ability to run arbitrary code, -it would also be easier to write and maintain. It was something new for -me, but once again it turned to be not too hard. I wrote about the first -steps in [a blog post](..2024-10-08-python-c). - -## Accepting contributors - -Around April last year a friend asked me if he could work on this -project as part of his bachelor thesis in computer science. And I said -no, because I wanted this to be my personal, slow-paced project and I -could not assure him that it would be in a stable state while he was -working on his thesis. So I made -[a spin-off project](https://git.tronto.net/cubecore) -and told him he could base his work on that. I offered to help if he -had any questions on the topice, and that was it. - -Bur when he asked again to contribute a few months later, the situation -had changed. The project was in a stable state and most of the preliminary -work was done, so I decided to let him have fun and implement one of the -core parts of the tool. I started reviewing his contributions, and I had -to find the right balance between keeping the code consistent with what -I had mind and let him submit his code without requesting unnecessary -changes. This is something I regularly do for work, but doing this for -a project of which I am the creator and main author felt much different! - -Besides the learning experience, this also had some technical benefits, -mostly related to the fact that we are using different compilers, -operating system and CPU architectures: now I have a library that compiles -with both GCC and Clang, runs on Linux and MacOS, and is optimized for -both x86 with -[AVX2 extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2) -and ARM with -[NEON](https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)). - -## Conclusion - -Rewriting from scratch is rarely the fastest way to fix a project, -but it feels nice and, as this post shows, it is a great opportunity to -learn cool new stuff.