commit 233169d9ab5fb8a796d91a3694f8a522fa54d194 parent 74299c8919cb2e2f6e1678817529e3a730cf6b81 Author: Sebastiano Tronto <sebastiano@tronto.net> Date: Mon, 26 Jan 2026 22:52:15 +0100 Improve multithreading benchmarks (compare speedup factor, not with vcube) Diffstat:
21 files changed, 316 insertions(+), 225 deletions(-)
diff --git a/benchmarks/benchmarks.md b/benchmarks/benchmarks.md @@ -38,21 +38,6 @@ may be particularly inaccurate, because writing the solution (together with some log messages) to standard output may take a significant portion of the time, depending on the terminal emulator used and other factors. -The main test we performed was finding a single optimal solution, and we -compared the results with vcube. This test was run in a single-thread -configuration and in two multithread configurations (with 4 and 16 -threads). We also ran a test on finding *all* optimal solutions which, -as far as I know, is a use case not supported by vcube; this latter test -was only run on 16 threads. - -Since the size of the pruning table used by the solver is of utmost -importance, we include two statistics: time per cube and time per cube -*adjusted by table size*. The adjustment we took simply consists in -multiplying the time per cube by the size of the table. Empirically -this leads to more even results across the board, as the speed of the -solvers of the same family seems to scale linearly with the size of the -pruning table. - All benchmark tests were done on the following configuration: * CPU: AMD Ryzen 7 7700 (8 cores, 16 virtual threads) @@ -61,9 +46,29 @@ All benchmark tests were done on the following configuration: * Operating system: Debian 13 (Linux kernel 6.12.57) * Compiler: GCC 14.2.0 for H48 and Clang 19.1.7 for vcube -## Results (click on each item to expand) +## Benchmark results + +### Single solution, single thread + +The first test we performed was finding a single optimal solution, and +we compared the results with vcube. -<details><summary>Single solution, single thread</summary> +This test was run in a single-thread configuration, because the two +solvers use very different strategies for multithreading: vcube can only +parallelize by solving one cube per thread, while H48 is able to take +advantage of multiple threads even when solving a single cube. Therefore +we prefer to compare their single-threaded performance only. + +Since the size of the pruning table used by the solver is of utmost +importance, we include two statistics: time per cube and time per cube +*adjusted by table size*. The adjustment we took simply consists in +multiplying the time per cube by the size of the table. Empirically this +leads to more even results across the board, although the speed of the +solvers of the same family does not scale exactly linearly with the size +of the pruning table. + +<!-- The following details block can be found in benchmarks/single_thread_comparison.md --> +<details><summary>Results: Single solution, single thread</summary> Time per cube (in seconds, lower is better). @@ -103,95 +108,92 @@ Time per cube adjusted for table size (in seconds \* GiB, lower is better). <img src="img/20moves1thread.png"> </details> -<details><summary>Single solution, 4 threads</summary> +As we can see, adjusting for table size, H48 is generally faster than +vcube. The gap between the two solvers is larger for scrambles with +longer optimal solutions. + +### Single solution, multiple threads + +The same benchmark as before is repeated using 4 and 16 threads (recall +that the CPU used for these tests has 8 physical cores and 16 virtual +threads). + +As mentioned above, we don't compare these results to vcube. Instead, +we compare them with the single-threaded results for H48 and we show +how far the speedup factor is from a theoretically optimal 4x and 16x. + +<!-- The following details block can be found in benchmarks/tables_4_threads.md --> +<details><summary>Results: Single solution, 4 threads</summary> Time per cube (in seconds, lower is better). | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| |:---------|:-------|-------:|-------:|-------:|-------:|--------:| -|vcube 212 |58.2 GiB| 0.03| 0.27| 1.04| 7.70| (a) | |H48 h11 |56.5 GiB| 0.04| 0.14| 0.60| 3.97| 4.19| -|vcube 404 |31.8 GiB| 0.07| 0.30| 1.65| 16.17| (a) | |H48 h10 |28.3 GiB| 0.05| 0.20| 0.91| 6.65| 9.81| -|vcube 308 |21.2 GiB| 0.05| 0.35| 1.78| 16.61| (a) | |H48 h9 |14.1 GiB| 0.07| 0.39| 1.74| 12.54| 18.96| -|vcube 208 | 7.3 GiB| 0.16| 1.47| 5.86| | (a) | |H48 h8 | 7.1 GiB| 0.13| 0.90| 3.71| | | |H48 h7 | 3.5 GiB| 0.17| 1.28| 6.12| | | -|vcube 112 | 2.4 GiB| 0.29| 3.13| 11.95| | (a) | |H48 h6 | 1.8 GiB| 0.33| 2.47| 12.22| | | -Time per cube adjusted for table size (in seconds \* GiB, lower is better). +Speed-up factor (higher is better). | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| |:---------|:-------|-------:|-------:|-------:|-------:|--------:| -|vcube 212 |58.2 GiB| 2.03| 15.69| 60.25| 447.97| (a) | -|H48 h11 |56.5 GiB| 2.04| 8.17| 33.77| 224.59| 236.92| -|vcube 404 |31.8 GiB| 2.32| 9.50| 52.46| 514.50| (a) | -|H48 h10 |28.3 GiB| 1.37| 5.76| 25.83| 187.88| 277.18| -|vcube 308 |21.2 GiB| 1.02| 7.52| 37.82| 352.36| (a) | -|H48 h9 |14.1 GiB| 1.03| 5.45| 24.58| 177.17| 267.79| -|vcube 208 | 7.3 GiB| 1.18| 10.69| 42.63| | (a) | -|H48 h8 | 7.1 GiB| 0.92| 6.39| 26.20| | | -|H48 h7 | 3.5 GiB| 0.62| 4.53| 21.62| | | -|vcube 112 | 2.4 GiB| 0.69| 7.59| 28.97| | (a) | -|H48 h6 | 1.8 GiB| 0.59| 4.37| 21.61| | | - -(a) vcube cannot parallelize on a single scramble, the results for the -superflip are going to be the same as in the single thread case. - -<img src="img/17moves4threads.png"> -<img src="img/18moves4threads.png"> -<img src="img/19moves4threads.png"> -<img src="img/20moves4threads.png"> +|H48 h11 |56.5 GiB| 2.55| 3.46| 3.75| 3.96| 3.71| +|H48 h10 |28.3 GiB| 3.09| 3.71| 3.67| 3.54| 3.88| +|H48 h9 |14.1 GiB| 3.25| 3.84| 3.85| | | +|H48 h8 | 7.1 GiB| 3.53| 3.72| 3.81| | | +|H48 h7 | 3.5 GiB| 3.60| 3.78| 3.80| | | +|H48 h6 | 1.8 GiB| 3.77| 3.82| 3.79| | | +<img src="img/4threads.png"> + </details> -<details><summary>Single solution, 16 threads</summary> +<!-- The following details block can be found in benchmarks/tables_16_threads.md --> +<details><summary>Results: Single solution, 16 threads</summary> Time per cube (in seconds, lower is better). | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| |:---------|:-------|-------:|-------:|-------:|-------:|--------:| -|vcube 212 |58.2 GiB| 0.02| 0.13| 0.45| 2.84| (a) | |H48 h11 |56.5 GiB| 0.02| 0.06| 0.22| 1.33| 1.84| -|vcube 404 |31.8 GiB| 0.04| 0.14| 0.65| 6.08| (a) | |H48 h10 |28.3 GiB| 0.03| 0.08| 0.33| 2.34| 4.18| -|vcube 308 |21.2 GiB| 0.03| 0.19| 0.78| 6.67| (a) | |H48 h9 |14.1 GiB| 0.04| 0.15| 0.64| 4.45| 8.09| -|vcube 208 | 7.3 GiB| 0.08| 0.79| 2.43| | (a) | |H48 h8 | 7.1 GiB| 0.06| 0.34| 1.36| | | |H48 h7 | 3.5 GiB| 0.07| 0.47| 2.20| | | -|vcube 112 | 2.4 GiB| 0.15| 1.63| 5.10| | (a) | |H48 h6 | 1.8 GiB| 0.13| 0.91| 4.39| | | -Time per cube adjusted for table size (in seconds \* GiB, lower is better). +Speed-up factor (higher is better). | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| |:---------|:-------|-------:|-------:|-------:|-------:|--------:| -|vcube 212 |58.2 GiB| 0.95| 7.83| 26.04| 165.03| (a) | -|H48 h11 |56.5 GiB| 1.34| 3.48| 12.19| 74.93| 103.98| -|vcube 404 |31.8 GiB| 1.21| 4.60| 20.76| 193.43| (a) | -|H48 h10 |28.3 GiB| 0.74| 2.37| 9.46| 66.13| 118.23| -|vcube 308 |21.2 GiB| 0.67| 4.01| 16.48| 141.49| (a) | -|H48 h9 |14.1 GiB| 0.51| 2.15| 8.98| 62.92| 114.31| -|vcube 208 | 7.3 GiB| 0.56| 5.78| 17.68| | (a) | -|H48 h8 | 7.1 GiB| 0.40| 2.40| 9.59| | | -|H48 h7 | 3.5 GiB| 0.26| 1.66| 7.77| | | -|vcube 112 | 2.4 GiB| 0.35| 3.95| 12.37| | (a) | -|H48 h6 | 1.8 GiB| 0.23| 1.61| 7.76| | | - -(a) vcube cannot parallelize on a single scramble, the results for the -superflip are going to be the same as in the single thread case. - -<img src="img/17moves16threads.png"> -<img src="img/18moves16threads.png"> -<img src="img/19moves16threads.png"> -<img src="img/20moves16threads.png"> +|H48 h11 |56.5 GiB| 3.90| 8.14| 10.39| 11.86| 8.45| +|H48 h10 |28.3 GiB| 5.72| 9.00| 10.03| 10.04| 9.09| +|H48 h9 |14.1 GiB| 6.56| 9.75| 10.53| | | +|H48 h8 | 7.1 GiB| 8.08| 9.91| 10.40| | | +|H48 h7 | 3.5 GiB| 8.59| 10.30| 10.57| | | +|H48 h6 | 1.8 GiB| 9.60| 10.39| 10.54| | | +<img src="img/16threads.png"> + </details> -<details><summary>All solutions, 16 threads</summary> +We can see that H48 scales pretty well with 4 threads, getting close +to the 4x theoretical maximum speedup in slower cases (small table or +long solutions). + +In the 16 threads benchmark shows that, although the virtual threads +help push us beyond the 8x theoretical speedup that would be provided +by the 8 cores, we are nowhere near a 16x speedup. + +### All solutions -*Note: vcube does not have an option for finding multiple solutions.* +Finally, we ran a test on finding *all* optimal solutions which, as +far as I know, is a use case not supported by vcube. For convenience, +this test is only run on 16 threads. + +<!-- The following details block can be found in benchmarks/all_solutions.md --> +<details><summary>Results: All solutions (16 threads)</summary> Time per cube (in seconds, lower is better). @@ -217,16 +219,6 @@ Time per cube adjusted for table size (in seconds \* GiB, lower is better). </details> -## Comments on the results - -* Adjusting for table size, H48 is generally faster than vcube. -* The gap between the two solvers is larger for scrambles with - longer optimal solutions. -* On sets of 25 scrambles, H48 performs better than vcube when using - multiple threads, compared to their single-threaded performance. - However, this advantage will likely disappear (or - even invert) if we increase the size of the set. - ## Other notes * To repeat the benchmarks, use `./benchmarks/run-h48-benchmarks.sh`. @@ -249,8 +241,3 @@ Time per cube adjusted for table size (in seconds \* GiB, lower is better). * For H48, both GCC and Clang have been tried, with the same options; the resulting executable was about 10% faster with GCC compared to Clang. vcube only supports compiling with Clang. -* The performance of the H48 solver depends slightly, but measurably, on the - alignment of the pruning table in memory. This is not handled by the core - library, but by the program that uses it. For these tests, we have used as - reference implementation the program in `tools/301_solve_file`, which - ensures 64-byte alignment. diff --git a/benchmarks/img/16threads.png b/benchmarks/img/16threads.png Binary files differ. diff --git a/benchmarks/img/17moves16threads.png b/benchmarks/img/17moves16threads.png Binary files differ. diff --git a/benchmarks/img/17moves1thread.png b/benchmarks/img/17moves1thread.png Binary files differ. diff --git a/benchmarks/img/17moves4threads.png b/benchmarks/img/17moves4threads.png Binary files differ. diff --git a/benchmarks/img/18moves16threads.png b/benchmarks/img/18moves16threads.png Binary files differ. diff --git a/benchmarks/img/18moves1thread.png b/benchmarks/img/18moves1thread.png Binary files differ. diff --git a/benchmarks/img/18moves4threads.png b/benchmarks/img/18moves4threads.png Binary files differ. diff --git a/benchmarks/img/19moves16threads.png b/benchmarks/img/19moves16threads.png Binary files differ. diff --git a/benchmarks/img/19moves1thread.png b/benchmarks/img/19moves1thread.png Binary files differ. diff --git a/benchmarks/img/19moves4threads.png b/benchmarks/img/19moves4threads.png Binary files differ. diff --git a/benchmarks/img/20moves16threads.png b/benchmarks/img/20moves16threads.png Binary files differ. diff --git a/benchmarks/img/20moves1thread.png b/benchmarks/img/20moves1thread.png Binary files differ. diff --git a/benchmarks/img/20moves4threads.png b/benchmarks/img/20moves4threads.png Binary files differ. diff --git a/benchmarks/img/4threads.png b/benchmarks/img/4threads.png Binary files differ. diff --git a/benchmarks/plot-benchmarks.py b/benchmarks/plot-benchmarks.py @@ -1,6 +1,14 @@ import matplotlib.pyplot as plt import results_h48 import results_vcube +from pathlib import Path + +benchmarks_dir = Path("benchmarks") +benchmarks_img_dir = benchmarks_dir / "img" +benchmarks_single_thread = benchmarks_dir / "tables_1_thread.md" +benchmarks_4_threads = benchmarks_dir / "tables_4_threads.md" +benchmarks_16_threads = benchmarks_dir / "tables_16_threads.md" +benchmarks_all_solutions = benchmarks_dir / "tables_all_solutions.md" # Table sizes in bytes @@ -23,7 +31,7 @@ sizes_vcube = { # Printing tables in markdown format -def print_row(solver_name, solver_size, dict, mul_by_size, superflip_star): +def print_row(f, solver_name, solver_size, dict, mul_by_size, superflip_star): if dict is None: return solver_gib = solver_size / (2**30) @@ -36,104 +44,106 @@ def print_row(solver_name, solver_size, dict, mul_by_size, superflip_star): f"{dict["superflip"]*m:>9.2f}" if "superflip" in dict else s ] sep = "|" - print(sep + sep.join(cols) + sep) + f.write(sep + sep.join(cols) + sep + "\n") -def print_table(h48, vcube, ms, st): +def print_table(f, h48, vcube, ms, st): vcube = vcube or {} - print("| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip|") - print("|:---------|:-------|-------:|-------:|-------:|-------:|--------:|") - print_row("vcube 212", sizes_vcube[212], vcube.get(212), ms, not st) - print_row("H48 h11", sizes_h48[11], h48[11], ms, False) - print_row("vcube 404", sizes_vcube[404], vcube.get(404), ms, not st) - print_row("H48 h10", sizes_h48[10], h48[10], ms, False) - print_row("vcube 308", sizes_vcube[308], vcube.get(308), ms, not st) - print_row("H48 h9", sizes_h48[9], h48[9], ms, False) - print_row("vcube 208", sizes_vcube[208], vcube.get(208), ms, not st) - print_row("H48 h8", sizes_h48[8], h48[8], ms, False) - print_row("H48 h7", sizes_h48[7], h48[7], ms, False) - print_row("vcube 112", sizes_vcube[112], vcube.get(112), ms, not st) - print_row("H48 h6", sizes_h48[6], h48[6], ms, False) - -print("<details><summary>Single solution, single thread</summary>") -print() -print("Time per cube (in seconds, lower is better).") -print() -print_table(results_h48.h48_single_thread, results_vcube.vcube_single_thread, False, True) -print() -print("Time per cube adjusted for table size (in seconds \\* GiB, lower is better).") -print() -print_table(results_h48.h48_single_thread, results_vcube.vcube_single_thread, True, True) -print() -print( - '<img src="img/17moves1thread.png">\n' - '<img src="img/18moves1thread.png">\n' - '<img src="img/19moves1thread.png">\n' - '<img src="img/20moves1thread.png">' -) -print("</details>") - -print() -print("<details><summary>Single solution, 4 threads</summary>") -print() -print("Time per cube (in seconds, lower is better).") -print() -print_table(results_h48.h48_4_threads, results_vcube.vcube_4_threads, False, False) -print() -print("Time per cube adjusted for table size (in seconds \\* GiB, lower is better).") -print() -print_table(results_h48.h48_4_threads, results_vcube.vcube_4_threads, True, False) -print() -print("(a) vcube cannot parallelize on a single scramble, the results for the") -print("superflip are going to be the same as in the single thread case.") -print() -print( - '<img src="img/17moves4threads.png">\n' - '<img src="img/18moves4threads.png">\n' - '<img src="img/19moves4threads.png">\n' - '<img src="img/20moves4threads.png">' -) -print("</details>") - -print() -print("<details><summary>Single solution, 16 threads</summary>") -print() -print("Time per cube (in seconds, lower is better).") -print() -print_table(results_h48.h48_16_threads, results_vcube.vcube_16_threads, False, False) -print() -print("Time per cube adjusted for table size (in seconds \\* GiB, lower is better).") -print() -print_table(results_h48.h48_16_threads, results_vcube.vcube_16_threads, True, False) -print() -print("(a) vcube cannot parallelize on a single scramble, the results for the") -print("superflip are going to be the same as in the single thread case.") -print() -print( - '<img src="img/17moves16threads.png">\n' - '<img src="img/18moves16threads.png">\n' - '<img src="img/19moves16threads.png">\n' - '<img src="img/20moves16threads.png">' -) -print("</details>") - -print() -print("<details><summary>All solutions, 16 threads</summary>") -print() -print("*Note: vcube does not have an option for finding multiple solutions.*") -print() -print("Time per cube (in seconds, lower is better).") -print() -print_table(results_h48.h48_all_solutions, None, False, False) -print() -print("Time per cube adjusted for table size (in seconds \\* GiB, lower is better).") -print() -print_table(results_h48.h48_all_solutions, None, True, False) -print() -print("</details>") + f.write("| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip|\n") + f.write("|:---------|:-------|-------:|-------:|-------:|-------:|--------:|\n") + print_row(f, "vcube 212", sizes_vcube[212], vcube.get(212), ms, not st) + print_row(f, "H48 h11", sizes_h48[11], h48[11], ms, False) + print_row(f, "vcube 404", sizes_vcube[404], vcube.get(404), ms, not st) + print_row(f, "H48 h10", sizes_h48[10], h48[10], ms, False) + print_row(f, "vcube 308", sizes_vcube[308], vcube.get(308), ms, not st) + print_row(f, "H48 h9", sizes_h48[9], h48[9], ms, False) + print_row(f, "vcube 208", sizes_vcube[208], vcube.get(208), ms, not st) + print_row(f, "H48 h8", sizes_h48[8], h48[8], ms, False) + print_row(f, "H48 h7", sizes_h48[7], h48[7], ms, False) + print_row(f, "vcube 112", sizes_vcube[112], vcube.get(112), ms, not st) + print_row(f, "H48 h6", sizes_h48[6], h48[6], ms, False) + +def print_factor_table(f, slow, fast): + ratio = {} + for m in [6, 7, 8, 9, 10, 11]: + ratio[m] = {} + for k in slow[m]: + ratio[m][k] = (slow[m][k] / fast[m][k]) * (25 if k != "superflip" else 1) + f.write("| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip|\n") + f.write("|:---------|:-------|-------:|-------:|-------:|-------:|--------:|\n") + print_row(f, "H48 h11", sizes_h48[11], ratio[11], False, False) + print_row(f, "H48 h10", sizes_h48[10], ratio[10], False, False) + print_row(f, "H48 h9", sizes_h48[9], ratio[9], False, False) + print_row(f, "H48 h8", sizes_h48[8], ratio[8], False, False) + print_row(f, "H48 h7", sizes_h48[7], ratio[7], False, False) + print_row(f, "H48 h6", sizes_h48[6], ratio[6], False, False) + +with open(benchmarks_single_thread, "w") as f: + f.write(f"<!-- The following details block can be found in {benchmarks_single_thread} -->\n") + f.write("<details><summary>Results: Single solution, single thread</summary>\n") + f.write("\n") + f.write("Time per cube (in seconds, lower is better).\n") + f.write("\n") + print_table(f, results_h48.h48_single_thread, results_vcube.vcube_single_thread, False, True) + f.write("\n") + f.write("Time per cube adjusted for table size (in seconds \\* GiB, lower is better).\n") + f.write("\n") + print_table(f, results_h48.h48_single_thread, results_vcube.vcube_single_thread, True, True) + f.write("\n") + f.write( + '<img src="img/17moves1thread.png">\n' + '<img src="img/18moves1thread.png">\n' + '<img src="img/19moves1thread.png">\n' + '<img src="img/20moves1thread.png">\n' + ) + f.write("</details>\n") + +with open(benchmarks_4_threads, "w") as f: + f.write(f"<!-- The following details block can be found in {benchmarks_4_threads} -->\n") + f.write("<details><summary>Results: Single solution, 4 threads</summary>\n") + f.write("\n") + f.write("Time per cube (in seconds, lower is better).\n") + f.write("\n") + print_table(f, results_h48.h48_4_threads, None, False, False) + f.write("\n") + f.write("Speed-up factor (higher is better).\n") + f.write("\n") + print_factor_table(f, results_h48.h48_single_thread, results_h48.h48_4_threads) + f.write('<img src="img/4threadsspeedupfactor.png">\n') + f.write("\n") + f.write("</details>\n") + +with open(benchmarks_16_threads, "w") as f: + f.write(f"<!-- The following details block can be found in {benchmarks_16_threads} -->\n") + f.write("<details><summary>Results: Single solution, 16 threads</summary>\n") + f.write("\n") + f.write("Time per cube (in seconds, lower is better).\n") + f.write("\n") + print_table(f, results_h48.h48_16_threads, None, False, False) + f.write("\n") + f.write("Speed-up factor (higher is better).\n") + f.write("\n") + print_factor_table(f, results_h48.h48_single_thread, results_h48.h48_16_threads) + f.write('<img src="img/16threadsspeedupfactor.png">\n') + f.write("\n") + f.write("</details>\n") + +with open(benchmarks_all_solutions, "w") as f: + f.write(f"<!-- The following details block can be found in {benchmarks_all_solutions} -->\n") + f.write("<details><summary>Results: All solutions, 16 threads</summary>\n") + f.write("\n") + f.write("Time per cube (in seconds, lower is better).\n") + f.write("\n") + print_table(f, results_h48.h48_all_solutions, None, False, False) + f.write("\n") + f.write("Time per cube adjusted for table size (in seconds \\* GiB, lower is better).\n") + f.write("\n") + print_table(f, results_h48.h48_all_solutions, None, True, False) + f.write("\n") + f.write("</details>\n") # Plotting -def plot(title, hd, vd, key): +def plot_comparison(title, hd, vd, key): d = 1 if key == "superflip" else 25 h48x = [sizes_h48[m]/(2**30) for m in hd.keys() if key in hd[m]] vcubex = [sizes_vcube[m]/(2**30) for m in vd.keys() if key in vd[m]] @@ -148,17 +158,27 @@ def plot(title, hd, vd, key): plt.plot(vcubex, vcubey, "o--", label = "vcube") plt.legend(loc = "right") filename = title.replace(" ", "").replace(",", "") + ".png" - plt.savefig("benchmarks/img/" + filename, dpi=300) + plt.savefig(benchmarks_img_dir / filename, dpi=300) #plt.show() rh, rv = results_h48.h48_single_thread, results_vcube.vcube_single_thread for m in [17, 18, 19, 20]: - plot(f"{m} moves 1 thread", rh, rv, m) - -rh, rv = results_h48.h48_4_threads, results_vcube.vcube_4_threads -for m in [17, 18, 19, 20]: - plot(f"{m} moves 4 threads", rh, rv, m) + plot_comparison(f"{m} moves 1 thread", rh, rv, m) -rh, rv = results_h48.h48_16_threads, results_vcube.vcube_16_threads -for m in [17, 18, 19, 20]: - plot(f"{m} moves 16 threads", rh, rv, m) +def plot_multithread_scatter(title, slow, fast): + plt.clf() + plt.title(title) + plt.xlabel("Moves") + plt.ylabel("Speed-up factor") + x = [17, 18, 19] + plt.xticks(x) + for h in [11, 10, 9, 8, 7, 6]: + y = [slow[h][i] / fast[h][i] for i in x] + plt.scatter(x, y, label=f"H48 h{h}") + plt.legend(loc = "right") + filename = title.replace(" ", "").replace(",", "") + ".png" + plt.savefig(benchmarks_img_dir / filename, dpi=300) + plt.show() + +plot_multithread_scatter("4 threads speedup factor", results_h48.h48_single_thread, results_h48.h48_4_threads) +plot_multithread_scatter("16 threads speedup factor", results_h48.h48_single_thread, results_h48.h48_16_threads) diff --git a/benchmarks/results_h48_clang.py b/benchmarks/results_h48_clang.py @@ -1,36 +0,0 @@ -h48_single_thread = { - 6: {17: 57.7296, 18: 436.0622, 19: 2240.5087}, - 7: {17: 27.8748, 18: 222.3812, 19: 1114.8537}, - 8: {17: 23.6884, 18: 181.7169, 19: 722.1819}, - 9: {17: 10.1115, 18: 72.6935, 19: 328.5458}, - 10: {17: 7.0337, 18: 37.5239, 19: 179.3647, 20: 1267.3059, "superflip": 71.8359}, - 11: {17: 3.9693, 18: 24.2573, 19: 106.8995, 20: 688.5594, "superflip": 32.0008}, -} - -h48_4_threads = { - 6: {17: 16.7353, 18: 129.9696, 19: 663.2930}, - 7: {17: 8.4315, 18: 65.4883, 19: 348.3440}, - 8: {17: 6.9127, 18: 51.8117, 19: 208.8438}, - 9: {17: 3.0985, 18: 21.2920, 19: 95.3467, 20: 711.2927, "superflip": 35.3772}, - 10: {17: 2.0592, 18: 10.6636, 19: 49.9811, 20: 388.0203, "superflip": 18.6665}, - 11: {17: 1.3065, 18: 6.9261, 19: 29.7725, 20: 197.9893, "superflip": 9.1158}, -} - -h48_16_threads = { - 6: {17: 5.6335, 18: 39.2955, 19: 207.5730}, - 7: {17: 2.9328, 18: 21.0462, 19: 104.7917}, - 8: {17: 2.4790, 18: 17.0854, 19: 67.6617}, - 9: {17: 1.1561, 18: 7.0899, 19: 30.5952, 20: 220.9804, "superflip": 13.5599}, - 10: {17: 0.8002, 18: 3.6835, 19: 16.4910, 20: 114.4733, "superflip": 7.6359}, - 11: {17: 0.5475, 18: 2.4681, 19: 9.8224, 20: 60.7893, "superflip": 3.8795}, -} - -h48_all_solutions = { - 6: {17: 18.3998, 18: 223.3861, 19: 2459.7513}, - 7: {17: 9.6264, 18: 114.2752, 19: 1242.9741}, - 8: {17: 7.1793, 18: 81.2145, 19: 783.8113}, - 9: {17: 3.3487, 18: 36.9225, 19: 363.1912}, - 10: {17: 2.0002, 18: 20.4565, 19: 187.5199, 20: 1570.6074, "superflip": 68.2016}, - 11: {17: 1.2833, 18: 12.3616, 19: 105.6184, 20: 872.5879, "superflip": 34.1071}, -} - diff --git a/benchmarks/tables_16_threads.md b/benchmarks/tables_16_threads.md @@ -0,0 +1,27 @@ +<!-- The following details block can be found in benchmarks/tables_16_threads.md --> +<details><summary>Results: Single solution, 16 threads</summary> + +Time per cube (in seconds, lower is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|H48 h11 |56.5 GiB| 0.02| 0.06| 0.22| 1.33| 1.84| +|H48 h10 |28.3 GiB| 0.03| 0.08| 0.33| 2.34| 4.18| +|H48 h9 |14.1 GiB| 0.04| 0.15| 0.64| 4.45| 8.09| +|H48 h8 | 7.1 GiB| 0.06| 0.34| 1.36| | | +|H48 h7 | 3.5 GiB| 0.07| 0.47| 2.20| | | +|H48 h6 | 1.8 GiB| 0.13| 0.91| 4.39| | | + +Speed-up factor (higher is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|H48 h11 |56.5 GiB| 3.90| 8.14| 10.39| 11.86| 8.45| +|H48 h10 |28.3 GiB| 5.72| 9.00| 10.03| 10.04| 9.09| +|H48 h9 |14.1 GiB| 6.56| 9.75| 10.53| | | +|H48 h8 | 7.1 GiB| 8.08| 9.91| 10.40| | | +|H48 h7 | 3.5 GiB| 8.59| 10.30| 10.57| | | +|H48 h6 | 1.8 GiB| 9.60| 10.39| 10.54| | | +<img src="img/16threads.png"> + +</details> diff --git a/benchmarks/tables_1_thread.md b/benchmarks/tables_1_thread.md @@ -0,0 +1,40 @@ +<!-- The following details block can be found in benchmarks/tables_1_thread.md --> +<details><summary>Results: Single solution, single thread</summary> + +Time per cube (in seconds, lower is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|vcube 212 |58.2 GiB| 0.11| 0.75| 3.43| 27.28| 19.30| +|H48 h11 |56.5 GiB| 0.09| 0.50| 2.24| 15.73| 15.55| +|vcube 404 |31.8 GiB| 0.23| 1.24| 6.10| 59.33| 268.26| +|H48 h10 |28.3 GiB| 0.15| 0.76| 3.36| 23.51| 38.05| +|vcube 308 |21.2 GiB| 0.17| 1.02| 6.20| 58.70| 604.35| +|H48 h9 |14.1 GiB| 0.24| 1.48| 6.69| | | +|vcube 208 | 7.3 GiB| 0.56| 4.36| 20.58| | | +|H48 h8 | 7.1 GiB| 0.46| 3.36| 14.13| | | +|H48 h7 | 3.5 GiB| 0.63| 4.85| 23.25| | | +|vcube 112 | 2.4 GiB| 0.96| 9.29| 40.52| | | +|H48 h6 | 1.8 GiB| 1.25| 9.45| 46.31| | | + +Time per cube adjusted for table size (in seconds \* GiB, lower is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|vcube 212 |58.2 GiB| 6.43| 43.80| 199.38| 1587.43| 1122.73| +|H48 h11 |56.5 GiB| 5.21| 28.28| 126.68| 889.01| 878.77| +|vcube 404 |31.8 GiB| 7.40| 39.47| 194.01| 1887.94| 8535.87| +|H48 h10 |28.3 GiB| 4.22| 21.37| 94.86| 664.22| 1075.08| +|vcube 308 |21.2 GiB| 3.51| 21.71| 131.50| 1245.26| 12819.94| +|H48 h9 |14.1 GiB| 3.34| 20.97| 94.57| | | +|vcube 208 | 7.3 GiB| 4.08| 31.74| 149.68| | | +|H48 h8 | 7.1 GiB| 3.25| 23.75| 99.82| | | +|H48 h7 | 3.5 GiB| 2.22| 17.12| 82.14| | | +|vcube 112 | 2.4 GiB| 2.33| 22.53| 98.23| | | +|H48 h6 | 1.8 GiB| 2.21| 16.70| 81.86| | | + +<img src="img/17moves1thread.png"> +<img src="img/18moves1thread.png"> +<img src="img/19moves1thread.png"> +<img src="img/20moves1thread.png"> +</details> diff --git a/benchmarks/tables_4_threads.md b/benchmarks/tables_4_threads.md @@ -0,0 +1,27 @@ +<!-- The following details block can be found in benchmarks/tables_4_threads.md --> +<details><summary>Results: Single solution, 4 threads</summary> + +Time per cube (in seconds, lower is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|H48 h11 |56.5 GiB| 0.04| 0.14| 0.60| 3.97| 4.19| +|H48 h10 |28.3 GiB| 0.05| 0.20| 0.91| 6.65| 9.81| +|H48 h9 |14.1 GiB| 0.07| 0.39| 1.74| 12.54| 18.96| +|H48 h8 | 7.1 GiB| 0.13| 0.90| 3.71| | | +|H48 h7 | 3.5 GiB| 0.17| 1.28| 6.12| | | +|H48 h6 | 1.8 GiB| 0.33| 2.47| 12.22| | | + +Speed-up factor (higher is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|H48 h11 |56.5 GiB| 2.55| 3.46| 3.75| 3.96| 3.71| +|H48 h10 |28.3 GiB| 3.09| 3.71| 3.67| 3.54| 3.88| +|H48 h9 |14.1 GiB| 3.25| 3.84| 3.85| | | +|H48 h8 | 7.1 GiB| 3.53| 3.72| 3.81| | | +|H48 h7 | 3.5 GiB| 3.60| 3.78| 3.80| | | +|H48 h6 | 1.8 GiB| 3.77| 3.82| 3.79| | | +<img src="img/4threads.png"> + +</details> diff --git a/benchmarks/tables_all_solutions.md b/benchmarks/tables_all_solutions.md @@ -0,0 +1,26 @@ +<!-- The following details block can be found in benchmarks/tables_all_solutions.md --> +<details><summary>Results: All solutions, 16 threads</summary> + +Time per cube (in seconds, lower is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|H48 h11 |56.5 GiB| 0.04| 0.26| 2.22| 18.96| 16.95| +|H48 h10 |28.3 GiB| 0.05| 0.42| 3.82| 34.42| 36.80| +|H48 h9 |14.1 GiB| 0.08| 0.73| 7.28| | | +|H48 h8 | 7.1 GiB| 0.15| 1.56| 15.41| | | +|H48 h7 | 3.5 GiB| 0.21| 2.38| 26.52| | | +|H48 h6 | 1.8 GiB| 0.39| 4.67| 53.00| | | + +Time per cube adjusted for table size (in seconds \* GiB, lower is better). + +| Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| +|:---------|:-------|-------:|-------:|-------:|-------:|--------:| +|H48 h11 |56.5 GiB| 2.11| 14.55| 125.21| 1071.23| 957.47| +|H48 h10 |28.3 GiB| 1.48| 11.96| 107.83| 972.35| 1039.73| +|H48 h9 |14.1 GiB| 1.08| 10.29| 102.88| | | +|H48 h8 | 7.1 GiB| 1.03| 10.99| 108.87| | | +|H48 h7 | 3.5 GiB| 0.74| 8.41| 93.69| | | +|H48 h6 | 1.8 GiB| 0.70| 8.25| 93.68| | | + +</details>