benchmarks.md - nissy-core - The "engine" of nissy, including the H48 optimal solver.

benchmarks.md (12401B)
      1 # Benchmarks for the H48 optimal solver and comparison with vcube
      2 
      3 This page contains some benchmarks for some of the possible
      4 configurations of the H48 optimal solver. For comparison we also
      5 include similar measurements for Andrew Skalski's impressive optimal
      6 solver [vcube](https://github.com/Voltara/vcube), which has been taken
      7 as a benchmark reference throughout the development of H48. The two
      8 solvers differ both in method (although they both use an A* search)
      9 and implementation.
     10 
     11 Similar benchmarks can be found:
     12 
     13 * In Chen Shuang's
     14   [dedicated github page](https://github.com/cs0x7f/cube_solver_test/wiki),
     15   which includes many more solvers. At the time of writing, the version
     16   of H48 used for these benchmarks is a rather old one. Note that
     17   cube48opt is a re-implementation of the H48 solver by Chen Shuang.
     18 * In Enrico Tenuti's [thesis](https://github.com/enricotenuti/h48thesis),
     19   also using an old implementation of H48, but including more data and
     20   nice plots.
     21 
     22 ## Setting
     23 
     24 *All movecounts are in Half Turn Metric*
     25 
     26 For the benchmarks we used some sets of random positions that can be found
     27 in `benchmarks/scrambles`. We divide them by optimal solution length,
     28 because the time to solve a scramble grows exponentially on the number of
     29 moves, so mixing up positions with long and short solutions is going to
     30 make the shorter solution almost irrelevant. Even within the same solution
     31 length the time to solve a random position can vary significantly,
     32 so we have used sets of 25 scrambles and taken the average of those,
     33 except for the [Superflip](https://en.wikipedia.org/wiki/Superflip),
     34 which is a single scramble.
     35 
     36 For short solutions, the very low numbers we get with the largest solver
     37 may be particularly inaccurate, because writing the solution (together
     38 with some log messages) to standard output may take a significant portion
     39 of the time, depending on the terminal emulator used and other factors.
     40 
     41 The main test we performed was finding a single optimal solution, and we
     42 compared the results with vcube. This test was run in a single-thread
     43 configuration and in two multithread configurations (with 4 and 16
     44 threads).  We also ran a test on finding *all* optimal solutions which,
     45 as far as I know, is a use case not supported by vcube; this latter test
     46 was only run on 16 threads.
     47 
     48 Since the size of the pruning table used by the solver is of utmost
     49 importance, we include two statistics: time per cube and time per cube
     50 *adjusted by table size*.  The adjustment we took simply consists in
     51 multiplying the time per cube by the size of the table. Empirically
     52 this leads to more even results across the board, as the speed of the
     53 solvers of the same family seems to scale linearly with the size of the
     54 pruning table.
     55 
     56 All benchmark tests were done on the following configuration:
     57 
     58 * CPU: AMD Ryzen 7 7700 (8 cores, 16 virtual threads)
     59 * Memory: 2x Corsair Vengeance 32GB 5600MHz
     60 * Motherboard: Gigabyte B650M K
     61 * Operating system: Debian 12 (Linux kernel 6.1.0)
     62 * Compiler: GCC 12.2.0 for H48 and Clang 14.0.6 for vcube
     63 
     64 ## Single solution
     65 
     66 Average time for finding a single optimal solution.
     67 
     68 ### Single thread
     69 
     70 Time per cube (in seconds, lower is better).
     71 
     72 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
     73 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
     74 |H48 h11 k2|56.1GiB |  0.23  |  1.15  |  5.08  | 31.30  | 278.47  |
     75 |vcube 404 |31.8GiB |  0.30  |  1.25  |  6.87  | 57.49  | 291.31  |
     76 |H48 h10 k2|28.1GiB |  0.34  |  1.80  |  7.77  |        |         |
     77 |vcube 308 |21.2GiB |  0.20  |  1.11  |  6.92  |        |         |
     78 |H48 h9 k2 |14.1GiB |  0.42  |  2.84  | 12.86  |        |         |
     79 |vcube 208 | 7.3GiB |  0.57  |  4.41  | 20.75  |        |         |
     80 |H48 h8 k2 | 7.1GiB |  0.86  |  6.66  | 27.40  |        |         |
     81 |H48 h7 k2 | 3.6GiB |  1.47  |  8.90  | 42.46  |        |         |
     82 |vcube 112 | 2.4GiB |  1.01  |  9.39  |        |        |         |
     83 |H48 h6 k2 | 1.8GiB |  2.28  | 16.89  |        |        |         |
     84 
     85 Time per cube adjusted for tables size (in seconds \* GiB, lower is better).
     86 
     87 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
     88 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
     89 |H48 h11 k2|56.1GiB | 12.90  | 64.51  | 284.99 |1755.93 |15622.17 |
     90 |vcube 404 |31.8GiB |  9.54  | 39.75  | 218.47 |1828.18 | 9263.66 |
     91 |H48 h10 k2|28.1GiB |  9.55  | 50.58  | 218.34 |        |         |
     92 |vcube 308 |21.2GiB |  4.24  | 23.53  | 146.70 |        |         |
     93 |H48 h9 k2 |14.1GiB |  5.92  | 40.04  | 181.33 |        |         |
     94 |vcube 208 | 7.3GiB |  4.16  | 32.19  | 151.48 |        |         |
     95 |H48 h8 k2 | 7.1GiB |  6.11  | 47.29  | 194.54 |        |         |
     96 |H48 h7 k2 | 3.6GiB |  5.29  | 32.04  | 152.86 |        |         |
     97 |vcube 112 | 2.4GiB |  2.42  | 22.53  |        |        |         |
     98 |H48 h6 k2 | 1.8GiB |  4.10  | 30.40  |        |        |         |
     99 
    100 <details><summary>Plots</summary>
    101 <img src="img/17moves1thread.png">
    102 <img src="img/18moves1thread.png">
    103 <img src="img/19moves1thread.png">
    104 <img src="img/20moves1thread.png">
    105 </details>
    106 
    107 ### Multithread (4 threads)
    108 
    109 Time per cube (in seconds, lower is better).
    110 
    111 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    112 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    113 |H48 h11 k2|56.1GiB |  0.06  |  0.31  |  1.31  |  7.96  |  77.81  |
    114 |vcube 404 |31.8GiB |  0.10  |  0.38  |  1.88  | 16.98  |   (a)   |
    115 |H48 h10 k2|28.1GiB |  0.10  |  0.47  |  2.00  | 13.54  | 114.35  |
    116 |vcube 308 |21.2GiB |  0.06  |  0.42  |  1.95  | 17.73  |   (a)   |
    117 |H48 h9 k2 |14.1GiB |  0.14  |  0.83  |  3.82  | 25.98  | 162.72  |
    118 |vcube 208 | 7.3GiB |  0.17  |  1.49  |  5.88  |        |   (a)   |
    119 |H48 h8 k2 | 7.1GiB |  0.27  |  2.02  |  7.94  |        |         |
    120 |H48 h7 k2 | 3.6GiB |  0.35  |  2.59  | 12.41  |        |         |
    121 |vcube 112 | 2.4GiB |  0.29  |  3.15  | 12.06  |        |   (a)   |
    122 |H48 h6 k2 | 1.8GiB |  0.65  |  4.79  | 23.91  |        |         |
    123 
    124 Time per cube adjusted for tables size (in seconds \* GiB, lower is better).
    125 
    126 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    127 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    128 |H48 h11 k2|56.1GiB |  3.37  | 17.39  | 73.49  | 446.56 | 4365.92 |
    129 |vcube 404 |31.8GiB |  3.80  | 12.08  | 59.78  | 539.96 |   (a)   |
    130 |H48 h10 k2|28.1GiB |  2.81  | 13.21  | 56.20  | 380.47 | 3213.24 |
    131 |vcube 308 |21.2GiB |  1.27  |  8.90  | 41.34  | 375.88 |   (a)   |
    132 |H48 h9 k2 |14.1GiB |  1.97  | 11.70  | 53.86  | 366.32 | 2294.35 |
    133 |vcube 208 | 7.3GiB |  1.24  | 10.88  | 42.92  |        |   (a)   |
    134 |H48 h8 k2 | 7.1GiB |  1.92  | 14.34  | 56.37  |        |         |
    135 |H48 h7 k2 | 3.6GiB |  1.26  |  9.32  | 44.68  |        |         |
    136 |vcube 112 | 2.4GiB |  0.70  |  7.56  | 28.94  |        |   (a)   |
    137 |H48 h6 k2 | 1.8GiB |  1.17  |  8.62  | 43.04  |        |         |
    138 
    139 <details><summary>Plots</summary>
    140 <img src="img/17moves4threads.png">
    141 <img src="img/18moves4threads.png">
    142 <img src="img/19moves4threads.png">
    143 <img src="img/20moves4threads.png">
    144 </details>
    145 
    146 (a) vcube cannot parallelize on a single scramble, the results for the
    147 Superflip are going to be the same as in the single thread case.
    148 
    149 ### Multithread (16 threads)
    150 
    151 Time per cube (in seconds, lower is better).
    152 
    153 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    154 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    155 |H48 h11 k2|56.1GiB |  0.02  |  0.10  |  0.43  |  2.48  |  26.29  |
    156 |vcube 404 |31.8GiB |  0.03  |  0.16  |  0.67  |  6.36  |   (a)   |
    157 |H48 h10 k2|28.1GiB |  0.03  |  0.16  |  0.74  |  4.43  |         |
    158 |vcube 308 |21.2GiB |  0.04  |  0.22  |  0.89  |  9.53  |   (a)   |
    159 |H48 h9 k2 |14.1GiB |  0.04  |  0.26  |  1.18  |  8.31  |         |
    160 |vcube 208 | 7.3GiB |  0.08  |  0.80  |  2.38  |        |   (a)   |
    161 |H48 h8 k2 | 7.1GiB |  0.08  |  0.60  |  2.48  |        |         |
    162 |H48 h7 k2 | 3.6GiB |  0.11  |  0.81  |  3.91  |        |         |
    163 |vcube 112 | 2.4GiB |  0.15  |  1.66  |  5.18  |        |   (a)   |
    164 |H48 h6 k2 | 1.8GiB |  0.21  |  1.53  |  7.82  |        |         |
    165 
    166 Time per cube adjusted for tables size (in seconds \* GiB, lower is better).
    167 
    168 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|Superflip|
    169 |:---------|:-------|-------:|-------:|-------:|-------:|--------:|
    170 |H48 h11 k2|56.1GiB |  1.12  |  5.61  | 24.12  | 139.13 | 1474.87 |
    171 |vcube 404 |31.8GiB |  1.08  |  5.09  | 21.31  | 202.25 |   (a)   |
    172 |H48 h10 k2|28.1GiB |  0.84  |  4.50  | 20.79  | 124.48 |         |
    173 |vcube 308 |21.2GiB |  0.85  |  4.66  | 18.87  | 202.04 |   (a)   |
    174 |H48 h9 k2 |14.1GiB |  0.56  |  3.67  | 16.64  | 117.17 |         |
    175 |vcube 208 | 7.3GiB |  0.58  |  5.84  | 17.37  |        |   (a)   |
    176 |H48 h8 k2 | 7.1GiB |  0.57  |  4.26  | 17.60  |        |         |
    177 |H48 h7 k2 | 3.6GiB |  0.40  |  2.92  | 14.07  |        |         |
    178 |vcube 112 | 2.4GiB |  0.36  |  3.98  | 12.43  |        |   (a)   |
    179 |H48 h6 k2 | 1.8GiB |  0.38  |  2.75  | 14.08  |        |         |
    180 
    181 <details><summary>Plots</summary>
    182 <img src="img/17moves16threads.png">
    183 <img src="img/18moves16threads.png">
    184 <img src="img/19moves16threads.png">
    185 <img src="img/20moves16threads.png">
    186 </details>
    187 
    188 (a) vcube cannot parallelize on a single scramble, the results for the
    189 Superflip are going to be the same as in the single thread case.
    190 
    191 ## All optimal solutions
    192 
    193 Average time for finding all optimal solutions.
    194 
    195 ### Multithread (16 threads)
    196 
    197 Time per cube (in seconds, lower is better).
    198 
    199 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|
    200 |:---------|:-------|-------:|-------:|-------:|-------:|
    201 |H48 h11 k2|56.1GiB |  0.05  |  0.50  |  4.24  | 19.75  |
    202 |H48 h10 k2|28.1GiB |  0.08  |  0.88  |  6.94  |        |
    203 |H48 h9 k2 |14.1GiB |  0.13  |  1.39  | 13.50  |        |
    204 |H48 h8 k2 | 7.1GiB |  0.25  |  2.85  |        |        |
    205 |H48 h7 k2 | 3.6GiB |  0.36  |  4.24  |        |        |
    206 |H48 h6 k2 | 1.8GiB |  0.69  |  8.20  |        |        |
    207 
    208 Time per cube adjusted for tables size (in seconds \* GiB, lower is better).
    209 
    210 |  Solver  |  Size  |17 moves|18 moves|19 moves|20 moves|
    211 |:---------|:-------|-------:|-------:|-------:|-------:|
    212 |H48 h11 k2|56.1GiB |  2.81  | 28.05  | 237.86 |1107.98 |
    213 |H48 h10 k2|28.1GiB |  2.25  | 24.73  | 195.01 |        |
    214 |H48 h9 k2 |14.1GiB |  1.83  | 19.60  | 190.35 |        |
    215 |H48 h8 k2 | 7.1GiB |  1.77  | 20.24  |        |        |
    216 |H48 h7 k2 | 3.6GiB |  1.30  | 15.26  |        |        |
    217 |H48 h6 k2 | 1.8GiB |  1.24  | 14.76  |        |        |
    218 
    219 ## Comments on the results
    220 
    221 * Adjusting for table size, vcube generally is a bit faster than H48,
    222   except for 20 moves scrambles where H48 is a clear winner.
    223 * On sets of 25 scrambles, H48 performs better than vcube when using
    224   multiple threads. However, this advantage will likely disappear (or
    225   even invert) if we increase the size of the set.
    226 
    227 ## Other notes
    228 
    229 * Missing values in the table mean that the test is very slow and I did
    230   not want to wait for it to finish. I may add these values in the future.
    231 * All the measurements above exclude the time needed to load the pruning
    232   tables into memory, which can be quite significant for large tables. To
    233   repeat these measurements, one can use the tool `301_solve_file`. For
    234   example:
    235 
    236 ```
    237 ./build tool solve_file h48h7k2 ./benchmarks/scrambles/scrambles-16.txt
    238 ```
    239 
    240   To find all solutions, add something like `99999 0` at the end of the
    241   command. This will tell the tool to find up to `99999` solutions that
    242   are at most `0` moves longer than optimal.
    243 * The measurements also excluded the one-off computation of the pruning
    244   tables which, for reasons related to the cube coordinates used, is
    245   significantly slower for H48 compared to vcube.
    246 * H48's and vcube's approach to multithreading are extremely different:
    247   H48 parallelize the search for each cube individually, vcube solves
    248   multiple cubes in parallel by dedicating a single thread to each of
    249   them. Both apporaches have pros and cons: vcube's approch has less
    250   overhead in coordination between the threads, but often some threads
    251   may be left without work when there are no more cubes left to solve.
    252 * Per-cube parallelization means that H48 will always be faster than
    253   vcube when solving a single cube.
    254 * vcube only supports x86 processors (Intel, AMD), while H48 runs on any
    255   architecture, including e.g. ARM (Macbook M series, android phones)
    256   and can be compiled to WebAssembly as well.
	nissy-core The "engine" of nissy, including the H48 optimal solver.
	git clone https://git.tronto.net/nissy-core
	Download \| Log \| Files \| Refs \| README \| LICENSE