nissy-core

The "engine" of nissy, including the H48 optimal solver.
git clone https://git.tronto.net/nissy-core
Download | Log | Files | Refs | README | LICENSE

benchmarks.md (11969B)


      1 # Benchmarks for the H48 optimal solver and comparison with vcube
      2 
      3 This page contains some benchmarks for some of the possible
      4 configurations of the H48 optimal solver. For comparison we also
      5 include similar measurements for Andrew Skalski's impressive optimal
      6 solver [vcube](https://github.com/Voltara/vcube), which has been taken
      7 as a benchmark reference throughout the development of H48. The two
      8 solvers differ both in method (although they both use an A* search)
      9 and implementation.
     10 
     11 Similar benchmarks can be found:
     12 
     13 * In Chen Shuang's
     14   [dedicated github page](https://github.com/cs0x7f/cube_solver_test/wiki),
     15   which includes many more solvers. At the time of writing, the version
     16   of H48 used for these benchmarks is a rather old one. Note that
     17   cube48opt is a re-implementation of the H48 solver by Chen Shuang.
     18 * In Enrico Tenuti's [thesis](https://github.com/enricotenuti/h48thesis),
     19   also using an old implementation of H48, but including more data and
     20   nice plots.
     21 
     22 ## Setting
     23 
     24 *All movecounts are in Half Turn Metric*
     25 
     26 For the benchmarks we used some sets of random positions that can be found
     27 in `benchmarks/scrambles`. We divide them by optimal solution length,
     28 because the time to solve a scramble grows exponentially on the number of
     29 moves, so mixing up positions with long and short solutions is going to
     30 make the shorter solution almost irrelevant. Even within the same solution
     31 length the time to solve a random position can vary significantly,
     32 so we have used sets of 25 scrambles and taken the average of those,
     33 except for the [Superflip](https://en.wikipedia.org/wiki/Superflip),
     34 which is a single scramble.
     35 
     36 For short solutions, the very low numbers we get with the largest solver
     37 may be particularly inaccurate, because writing the solution (together
     38 with some log messages) to standard output may take a significant portion
     39 of the time, depending on the terminal emulator used and other factors.
     40 
     41 All benchmark tests were done on the following configuration:
     42 
     43 * CPU: AMD Ryzen 7 7700 (8 cores, 16 virtual threads)
     44 * Memory: 2x Corsair Vengeance 32GB 5600MHz
     45 * Motherboard: Gigabyte B650M K
     46 * Operating system: Debian 13 (Linux kernel 6.12.57)
     47 * Compiler: GCC 14.2.0 for H48 and Clang 19.1.7 for vcube
     48 
     49 ## Benchmark results
     50 
     51 ### Single solution, single thread
     52 
     53 The first test we performed was finding a single optimal solution, and
     54 we compared the results with vcube.
     55 
     56 This test was run in a single-thread configuration, because the two
     57 solvers use very different strategies for multithreading: vcube can only
     58 parallelize by solving one cube per thread, while H48 is able to take
     59 advantage of multiple threads even when solving a single cube. Therefore
     60 we prefer to compare their single-threaded performance only.
     61 
     62 Since the size of the pruning table used by the solver is of utmost
     63 importance, we include two statistics: time per cube and time per cube
     64 *adjusted by table size*.  The adjustment we took simply consists in
     65 multiplying the time per cube by the size of the table. Empirically this
     66 leads to more even results across the board, although the speed of the
     67 solvers of the same family does not scale exactly linearly with the size
     68 of the pruning table.
     69 
     70 <!-- The following details block can be found in benchmarks/single_thread_comparison.md -->
     71 <details><summary>Results: Single solution, single thread</summary>
     72 
     73 Time per cube (in seconds, lower is better).
     74 
     75 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
     76 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
     77 |vcube 212 |58.2 GiB|    0.11|    0.75|    3.43|   27.28|    19.30|
     78 |H48 h11   |56.5 GiB|    0.09|    0.50|    2.24|   15.73|    15.55|
     79 |vcube 404 |31.8 GiB|    0.23|    1.24|    6.10|   59.33|   268.26|
     80 |H48 h10   |28.3 GiB|    0.15|    0.76|    3.36|   23.51|    38.05|
     81 |vcube 308 |21.2 GiB|    0.17|    1.02|    6.20|   58.70|   604.35|
     82 |H48 h9    |14.1 GiB|    0.24|    1.48|    6.69|        |         |
     83 |vcube 208 | 7.3 GiB|    0.56|    4.36|   20.58|        |         |
     84 |H48 h8    | 7.1 GiB|    0.46|    3.36|   14.13|        |         |
     85 |H48 h7    | 3.5 GiB|    0.63|    4.85|   23.25|        |         |
     86 |vcube 112 | 2.4 GiB|    0.96|    9.29|   40.52|        |         |
     87 |H48 h6    | 1.8 GiB|    1.25|    9.45|   46.31|        |         |
     88 
     89 Time per cube adjusted for table size (in seconds \* GiB, lower is better).
     90 
     91 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
     92 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
     93 |vcube 212 |58.2 GiB|    6.43|   43.80|  199.38| 1587.43|  1122.73|
     94 |H48 h11   |56.5 GiB|    5.21|   28.28|  126.68|  889.01|   878.77|
     95 |vcube 404 |31.8 GiB|    7.40|   39.47|  194.01| 1887.94|  8535.87|
     96 |H48 h10   |28.3 GiB|    4.22|   21.37|   94.86|  664.22|  1075.08|
     97 |vcube 308 |21.2 GiB|    3.51|   21.71|  131.50| 1245.26| 12819.94|
     98 |H48 h9    |14.1 GiB|    3.34|   20.97|   94.57|        |         |
     99 |vcube 208 | 7.3 GiB|    4.08|   31.74|  149.68|        |         |
    100 |H48 h8    | 7.1 GiB|    3.25|   23.75|   99.82|        |         |
    101 |H48 h7    | 3.5 GiB|    2.22|   17.12|   82.14|        |         |
    102 |vcube 112 | 2.4 GiB|    2.33|   22.53|   98.23|        |         |
    103 |H48 h6    | 1.8 GiB|    2.21|   16.70|   81.86|        |         |
    104 
    105 <img src="img/17moves1thread.png">
    106 <img src="img/18moves1thread.png">
    107 <img src="img/19moves1thread.png">
    108 <img src="img/20moves1thread.png">
    109 </details>
    110 
    111 As we can see, adjusting for table size, H48 is generally faster than
    112 vcube.  The gap between the two solvers is larger for scrambles with
    113 longer optimal solutions.
    114 
    115 ### Single solution, multiple threads
    116 
    117 The same benchmark as before is repeated using 4 and 16 threads (recall
    118 that the CPU used for these tests has 8 physical cores and 16 virtual
    119 threads).
    120 
    121 As mentioned above, we don't compare these results to vcube. Instead,
    122 we compare them with the single-threaded results for H48 and we show
    123 how far the speedup factor is from a theoretically optimal 4x and 16x.
    124 
    125 <!-- The following details block can be found in benchmarks/tables_4_threads.md -->
    126 <details><summary>Results: Single solution, 4 threads</summary>
    127 
    128 Time per cube (in seconds, lower is better).
    129 
    130 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    131 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    132 |H48 h11   |56.5 GiB|    0.04|    0.14|    0.60|    3.97|     4.19|
    133 |H48 h10   |28.3 GiB|    0.05|    0.20|    0.91|    6.65|     9.81|
    134 |H48 h9    |14.1 GiB|    0.07|    0.39|    1.74|   12.54|    18.96|
    135 |H48 h8    | 7.1 GiB|    0.13|    0.90|    3.71|        |         |
    136 |H48 h7    | 3.5 GiB|    0.17|    1.28|    6.12|        |         |
    137 |H48 h6    | 1.8 GiB|    0.33|    2.47|   12.22|        |         |
    138 
    139 Speed-up factor (higher is better).
    140 
    141 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    142 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    143 |H48 h11   |56.5 GiB|    2.55|    3.46|    3.75|    3.96|     3.71|
    144 |H48 h10   |28.3 GiB|    3.09|    3.71|    3.67|    3.54|     3.88|
    145 |H48 h9    |14.1 GiB|    3.25|    3.84|    3.85|        |         |
    146 |H48 h8    | 7.1 GiB|    3.53|    3.72|    3.81|        |         |
    147 |H48 h7    | 3.5 GiB|    3.60|    3.78|    3.80|        |         |
    148 |H48 h6    | 1.8 GiB|    3.77|    3.82|    3.79|        |         |
    149 
    150 <img src="img/4threadsspeedupfactor.png">
    151 
    152 </details>
    153 
    154 <!-- The following details block can be found in benchmarks/tables_16_threads.md -->
    155 <details><summary>Results: Single solution, 16 threads</summary>
    156 
    157 Time per cube (in seconds, lower is better).
    158 
    159 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    160 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    161 |H48 h11   |56.5 GiB|    0.02|    0.06|    0.22|    1.33|     1.84|
    162 |H48 h10   |28.3 GiB|    0.03|    0.08|    0.33|    2.34|     4.18|
    163 |H48 h9    |14.1 GiB|    0.04|    0.15|    0.64|    4.45|     8.09|
    164 |H48 h8    | 7.1 GiB|    0.06|    0.34|    1.36|        |         |
    165 |H48 h7    | 3.5 GiB|    0.07|    0.47|    2.20|        |         |
    166 |H48 h6    | 1.8 GiB|    0.13|    0.91|    4.39|        |         |
    167 
    168 Speed-up factor (higher is better).
    169 
    170 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    171 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    172 |H48 h11   |56.5 GiB|    3.90|    8.14|   10.39|   11.86|     8.45|
    173 |H48 h10   |28.3 GiB|    5.72|    9.00|   10.03|   10.04|     9.09|
    174 |H48 h9    |14.1 GiB|    6.56|    9.75|   10.53|        |         |
    175 |H48 h8    | 7.1 GiB|    8.08|    9.91|   10.40|        |         |
    176 |H48 h7    | 3.5 GiB|    8.59|   10.30|   10.57|        |         |
    177 |H48 h6    | 1.8 GiB|    9.60|   10.39|   10.54|        |         |
    178 
    179 <img src="img/16threadsspeedupfactor.png">
    180 
    181 </details>
    182 
    183 We can see that H48 scales pretty well with 4 threads, getting close
    184 to the 4x theoretical maximum speedup in slower cases (small table or
    185 long solutions).
    186 
    187 In the 16 threads benchmark shows that, although the virtual threads
    188 help push us beyond the 8x theoretical speedup that would be provided
    189 by the 8 cores, we are nowhere near a 16x speedup.
    190 
    191 ### All solutions
    192 
    193 Finally, we ran a test on finding *all* optimal solutions which, as
    194 far as I know, is a use case not supported by vcube. For convenience,
    195 this test is only run on 16 threads.
    196 
    197 <!-- The following details block can be found in benchmarks/all_solutions.md -->
    198 <details><summary>Results: All solutions (16 threads)</summary>
    199 
    200 Time per cube (in seconds, lower is better).
    201 
    202 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    203 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    204 |H48 h11   |56.5 GiB|    0.04|    0.26|    2.22|   18.96|    16.95|
    205 |H48 h10   |28.3 GiB|    0.05|    0.42|    3.82|   34.42|    36.80|
    206 |H48 h9    |14.1 GiB|    0.08|    0.73|    7.28|        |         |
    207 |H48 h8    | 7.1 GiB|    0.15|    1.56|   15.41|        |         |
    208 |H48 h7    | 3.5 GiB|    0.21|    2.38|   26.52|        |         |
    209 |H48 h6    | 1.8 GiB|    0.39|    4.67|   53.00|        |         |
    210 
    211 Time per cube adjusted for table size (in seconds \* GiB, lower is better).
    212 
    213 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    214 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    215 |H48 h11   |56.5 GiB|    2.11|   14.55|  125.21| 1071.23|   957.47|
    216 |H48 h10   |28.3 GiB|    1.48|   11.96|  107.83|  972.35|  1039.73|
    217 |H48 h9    |14.1 GiB|    1.08|   10.29|  102.88|        |         |
    218 |H48 h8    | 7.1 GiB|    1.03|   10.99|  108.87|        |         |
    219 |H48 h7    | 3.5 GiB|    0.74|    8.41|   93.69|        |         |
    220 |H48 h6    | 1.8 GiB|    0.70|    8.25|   93.68|        |         |
    221 
    222 </details>
    223 
    224 ## Other notes
    225 
    226 * To repeat the benchmarks, use `./benchmarks/run-h48-benchmarks.sh`.
    227 * All the measurements above exclude the time needed to load the pruning
    228   tables into memory, which can be quite significant for large tables.
    229 * The measurements also excluded the one-off computation of the pruning
    230   tables which, for reasons related to the cube coordinates used, is
    231   significantly slower for H48 compared to vcube.
    232 * H48's and vcube's approach to multithreading are extremely different:
    233   H48 parallelize the search for each cube individually, vcube solves
    234   multiple cubes in parallel by dedicating a single thread to each of
    235   them. Both apporaches have pros and cons: vcube's approch has less
    236   overhead in coordination between the threads, but often some threads
    237   may be left without work when there are no more cubes left to solve.
    238 * Per-cube parallelization means that H48 will always be faster than
    239   vcube when solving a single cube.
    240 * vcube only supports x86 processors (Intel, AMD), while H48 runs on any
    241   architecture, including e.g. ARM (Macbook M series, android phones)
    242   and can be compiled to WebAssembly as well.
    243 * For H48, both GCC and Clang have been tried, with the same options;
    244   the resulting executable was about 10% faster with GCC compared to Clang.
    245   vcube only supports compiling with Clang.