Improve performance of H48 by 10-30% using interleaved fallback tables. - nissy-core - The "engine" of nissy, including the H48 optimal solver.

commit d5c60236c06e67e8da45e63843b37e5d619e9d7b
parent ff68d44aefada070b756a1bdbb0d4d722779440d
Author: Sebastiano Tronto <sebastiano@tronto.net>
Date:   Wed, 17 Dec 2025 20:54:23 +0100

Improve performance of H48 by 10-30% using interleaved fallback tables.

(See the last 12 commits or so)

H48 now uses interleaved fallback tables, similarly to nxopt / vcube.
This allowed us to simplify the code (we don't use k != 2 anymore for
H48) and gives some nice performance improvement, although repeated
measurements show inconsistent results. The actual speed up depends on
the table, and it is more pronounced on larger versions of the solver.

The nasty part about using interleaved tables is that they are the
most efficient when they are aligned in memory to 512 bits, but the
core library defers memory management to the implementor, so there is
no way to ensure this. For example, any application that wants to save
the tables to a file and then re-load them on a subsequent run (such
as our shell and tools) should make sure to load the data in a 512-bit
aligned memory buffer. The best we can do on the library side having
our main lookup table be 512-bit alinged within the whole solver data
(which includes e.g. cocsepdata and a preamble).

I have also moved some conditionals around in the various checks in the
search dfs hoping to improve the performance further, but the results
have been barely noticeable.

The benchmarks have been updated. Moreover, there is now a way to re-run
them with a single script (see the updates in the benchmarks folder
for details).

Further attempts at optimizing the code via known techniques (such as
prefetching) have failed, but I'll come back to this.

	nissy-core The "engine" of nissy, including the H48 optimal solver.
	git clone https://git.tronto.net/nissy-core
	Download \| Log \| Files \| Refs \| README \| LICENSE