benchmarks.md (13050B)
1 # Benchmarks for the H48 optimal solver and comparison with vcube 2 3 This page contains some benchmarks for some of the possible 4 configurations of the H48 optimal solver. For comparison we also 5 include similar measurements for Andrew Skalski's impressive optimal 6 solver [vcube](https://github.com/Voltara/vcube), which has been taken 7 as a benchmark reference throughout the development of H48. The two 8 solvers differ both in method (although they both use an A* search) 9 and implementation. 10 11 Similar benchmarks can be found: 12 13 * In Chen Shuang's 14 [dedicated github page](https://github.com/cs0x7f/cube_solver_test/wiki), 15 which includes many more solvers. At the time of writing, the version 16 of H48 used for these benchmarks is a rather old one. Note that 17 cube48opt is a re-implementation of the H48 solver by Chen Shuang. 18 * In Enrico Tenuti's [thesis](https://github.com/enricotenuti/h48thesis), 19 also using an old implementation of H48, but including more data and 20 nice plots. 21 22 ## Setting 23 24 *All movecounts are in Half Turn Metric* 25 26 For the benchmarks we used some sets of random positions that can be found 27 in `benchmarks/scrambles`. We divide them by optimal solution length, 28 because the time to solve a scramble grows exponentially on the number of 29 moves, so mixing up positions with long and short solutions is going to 30 make the shorter solution almost irrelevant. Even within the same solution 31 length the time to solve a random position can vary significantly, 32 so we have used sets of 25 scrambles and taken the average of those, 33 except for the [Superflip](https://en.wikipedia.org/wiki/Superflip), 34 which is a single scramble. 35 36 For short solutions, the very low numbers we get with the largest solver 37 may be particularly inaccurate, because writing the solution (together 38 with some log messages) to standard output may take a significant portion 39 of the time, depending on the terminal emulator used and other factors. 40 41 The main test we performed was finding a single optimal solution, and we 42 compared the results with vcube. This test was run in a single-thread 43 configuration and in two multithread configurations (with 4 and 16 44 threads). We also ran a test on finding *all* optimal solutions which, 45 as far as I know, is a use case not supported by vcube; this latter test 46 was only run on 16 threads. 47 48 Since the size of the pruning table used by the solver is of utmost 49 importance, we include two statistics: time per cube and time per cube 50 *adjusted by table size*. The adjustment we took simply consists in 51 multiplying the time per cube by the size of the table. Empirically 52 this leads to more even results across the board, as the speed of the 53 solvers of the same family seems to scale linearly with the size of the 54 pruning table. 55 56 All benchmark tests were done on the following configuration: 57 58 * CPU: AMD Ryzen 7 7700 (8 cores, 16 virtual threads) 59 * Memory: 2x Corsair Vengeance 32GB 5600MHz 60 * Motherboard: Gigabyte B650M K 61 * Operating system: Debian 13 (Linux kernel 6.12.57) 62 * Compiler: GCC 14.2.0 for H48 and Clang 19.1.7 for vcube 63 64 ## Results (click on each item to expand) 65 66 <details><summary>Single solution, single thread</summary> 67 68 Time per cube (in seconds, lower is better). 69 70 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 71 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 72 |vcube 212 |58.2 GiB| 0.11| 0.75| 3.43| 27.28| 19.30| 73 |H48 h11 |56.5 GiB| 0.13| 0.73| 3.43| 23.28| 19.26| 74 |vcube 404 |31.8 GiB| 0.23| 1.24| 6.10| 59.33| 268.26| 75 |H48 h10 |28.3 GiB| 0.23| 1.20| 6.18| 43.51| 48.84| 76 |vcube 308 |21.2 GiB| 0.17| 1.02| 6.20| 58.70| 604.35| 77 |H48 h9 |14.1 GiB| 0.36| 2.47| 11.79| | | 78 |vcube 208 | 7.3 GiB| 0.56| 4.36| 20.58| | | 79 |H48 h8 | 7.1 GiB| 0.79| 6.09| 25.58| | | 80 |H48 h7 | 3.5 GiB| 1.07| 8.59| 42.16| | | 81 |vcube 112 | 2.4 GiB| 0.96| 9.29| 40.52| | | 82 |H48 h6 | 1.8 GiB| 2.12| 16.29| 82.24| | | 83 84 Time per cube adjusted for table size (in seconds \* GiB, lower is better). 85 86 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 87 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 88 |vcube 212 |58.2 GiB| 6.43| 43.80| 199.38| 1587.43| 1122.73| 89 |H48 h11 |56.5 GiB| 7.33| 41.47| 193.67| 1315.32| 1088.34| 90 |vcube 404 |31.8 GiB| 7.40| 39.47| 194.01| 1887.94| 8535.87| 91 |H48 h10 |28.3 GiB| 6.60| 34.00| 174.65| 1229.18| 1379.90| 92 |vcube 308 |21.2 GiB| 3.51| 21.71| 131.50| 1245.26| 12819.94| 93 |H48 h9 |14.1 GiB| 5.03| 34.85| 166.51| | | 94 |vcube 208 | 7.3 GiB| 4.08| 31.74| 149.68| | | 95 |H48 h8 | 7.1 GiB| 5.61| 43.02| 180.71| | | 96 |H48 h7 | 3.5 GiB| 3.80| 30.34| 148.96| | | 97 |vcube 112 | 2.4 GiB| 2.33| 22.53| 98.23| | | 98 |H48 h6 | 1.8 GiB| 3.75| 28.79| 145.36| | | 99 100 <img src="img/17moves1thread.png"> 101 <img src="img/18moves1thread.png"> 102 <img src="img/19moves1thread.png"> 103 <img src="img/20moves1thread.png"> 104 </details> 105 106 <details><summary>Single solution, 4 threads</summary> 107 108 Time per cube (in seconds, lower is better). 109 110 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 111 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 112 |vcube 212 |58.2 GiB| 0.03| 0.27| 1.04| 7.70| (a) | 113 |H48 h11 |56.5 GiB| 0.04| 0.22| 0.97| 6.98| 5.95| 114 |vcube 404 |31.8 GiB| 0.07| 0.30| 1.65| 16.17| (a) | 115 |H48 h10 |28.3 GiB| 0.07| 0.36| 1.79| 12.42| 14.03| 116 |vcube 308 |21.2 GiB| 0.05| 0.35| 1.78| 16.61| (a) | 117 |H48 h9 |14.1 GiB| 0.11| 0.70| 3.48| 25.61| 25.37| 118 |vcube 208 | 7.3 GiB| 0.16| 1.47| 5.86| | (a) | 119 |H48 h8 | 7.1 GiB| 0.23| 1.74| 7.19| | | 120 |H48 h7 | 3.5 GiB| 0.32| 2.45| 12.12| | | 121 |vcube 112 | 2.4 GiB| 0.29| 3.13| 11.95| | (a) | 122 |H48 h6 | 1.8 GiB| 0.61| 4.61| 24.22| | | 123 124 Time per cube adjusted for table size (in seconds \* GiB, lower is better). 125 126 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 127 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 128 |vcube 212 |58.2 GiB| 2.03| 15.69| 60.25| 447.97| (a) | 129 |H48 h11 |56.5 GiB| 2.53| 12.71| 54.58| 394.33| 336.47| 130 |vcube 404 |31.8 GiB| 2.32| 9.50| 52.46| 514.50| (a) | 131 |H48 h10 |28.3 GiB| 2.11| 10.23| 50.61| 350.78| 396.48| 132 |vcube 308 |21.2 GiB| 1.02| 7.52| 37.82| 352.36| (a) | 133 |H48 h9 |14.1 GiB| 1.57| 9.96| 49.23| 361.85| 358.45| 134 |vcube 208 | 7.3 GiB| 1.18| 10.69| 42.63| | (a) | 135 |H48 h8 | 7.1 GiB| 1.66| 12.27| 50.82| | | 136 |H48 h7 | 3.5 GiB| 1.11| 8.67| 42.83| | | 137 |vcube 112 | 2.4 GiB| 0.69| 7.59| 28.97| | (a) | 138 |H48 h6 | 1.8 GiB| 1.08| 8.16| 42.81| | | 139 140 (a) vcube cannot parallelize on a single scramble, the results for the 141 superflip are going to be the same as in the single thread case. 142 143 <img src="img/17moves4threads.png"> 144 <img src="img/18moves4threads.png"> 145 <img src="img/19moves4threads.png"> 146 <img src="img/20moves4threads.png"> 147 </details> 148 149 <details><summary>Single solution, 16 threads</summary> 150 151 Time per cube (in seconds, lower is better). 152 153 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 154 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 155 |vcube 212 |58.2 GiB| 0.02| 0.13| 0.45| 2.84| (a) | 156 |H48 h11 |56.5 GiB| 0.02| 0.08| 0.33| 2.22| 2.41| 157 |vcube 404 |31.8 GiB| 0.04| 0.14| 0.65| 6.08| (a) | 158 |H48 h10 |28.3 GiB| 0.03| 0.13| 0.58| 4.21| 5.56| 159 |vcube 308 |21.2 GiB| 0.03| 0.19| 0.78| 6.67| (a) | 160 |H48 h9 |14.1 GiB| 0.04| 0.25| 1.10| 8.05| 10.77| 161 |vcube 208 | 7.3 GiB| 0.08| 0.79| 2.43| | (a) | 162 |H48 h8 | 7.1 GiB| 0.08| 0.57| 2.40| | | 163 |H48 h7 | 3.5 GiB| 0.11| 0.83| 4.00| | | 164 |vcube 112 | 2.4 GiB| 0.15| 1.63| 5.10| | (a) | 165 |H48 h6 | 1.8 GiB| 0.21| 1.51| 7.74| | | 166 167 Time per cube adjusted for table size (in seconds \* GiB, lower is better). 168 169 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 170 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 171 |vcube 212 |58.2 GiB| 0.95| 7.83| 26.04| 165.03| (a) | 172 |H48 h11 |56.5 GiB| 1.12| 4.61| 18.90| 125.27| 136.10| 173 |vcube 404 |31.8 GiB| 1.21| 4.60| 20.76| 193.43| (a) | 174 |H48 h10 |28.3 GiB| 0.86| 3.61| 16.47| 118.85| 157.13| 175 |vcube 308 |21.2 GiB| 0.67| 4.01| 16.48| 141.49| (a) | 176 |H48 h9 |14.1 GiB| 0.60| 3.51| 15.57| 113.71| 152.21| 177 |vcube 208 | 7.3 GiB| 0.56| 5.78| 17.68| | (a) | 178 |H48 h8 | 7.1 GiB| 0.59| 4.03| 16.98| | | 179 |H48 h7 | 3.5 GiB| 0.41| 2.94| 14.12| | | 180 |vcube 112 | 2.4 GiB| 0.35| 3.95| 12.37| | (a) | 181 |H48 h6 | 1.8 GiB| 0.38| 2.67| 13.68| | | 182 183 (a) vcube cannot parallelize on a single scramble, the results for the 184 superflip are going to be the same as in the single thread case. 185 186 <img src="img/17moves16threads.png"> 187 <img src="img/18moves16threads.png"> 188 <img src="img/19moves16threads.png"> 189 <img src="img/20moves16threads.png"> 190 </details> 191 192 <details><summary>All solutions, 16 threads</summary> 193 194 *Note: vcube does not have an option for finding multiple solutions.* 195 196 Time per cube (in seconds, lower is better). 197 198 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 199 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 200 |H48 h11 |56.5 GiB| 0.05| 0.41| 3.69| 31.65| 21.80| 201 |H48 h10 |28.3 GiB| 0.07| 0.70| 6.73| 58.61| 48.62| 202 |H48 h9 |14.1 GiB| 0.12| 1.28| 12.93| | | 203 |H48 h8 | 7.1 GiB| 0.25| 2.72| 27.42| | | 204 |H48 h7 | 3.5 GiB| 0.37| 4.24| 47.10| | | 205 |H48 h6 | 1.8 GiB| 0.69| 8.18| 91.85| | | 206 207 Time per cube adjusted for table size (in seconds \* GiB, lower is better). 208 209 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 210 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 211 |H48 h11 |56.5 GiB| 2.57| 23.07| 208.71| 1788.12| 1231.86| 212 |H48 h10 |28.3 GiB| 2.01| 19.77| 190.14| 1655.95| 1373.79| 213 |H48 h9 |14.1 GiB| 1.72| 18.07| 182.67| | | 214 |H48 h8 | 7.1 GiB| 1.74| 19.21| 193.74| | | 215 |H48 h7 | 3.5 GiB| 1.29| 14.98| 166.41| | | 216 |H48 h6 | 1.8 GiB| 1.21| 14.47| 162.36| | | 217 218 </details> 219 220 ## Comments on the results 221 222 * Adjusting for table size, vcube generally is a bit faster than H48, 223 except for 20 moves scrambles where H48 is a clear winner. 224 * On sets of 25 scrambles, H48 performs better than vcube when using 225 multiple threads. However, this advantage will likely disappear (or 226 even invert) if we increase the size of the set. 227 228 ## Other notes 229 230 * To repeat the benchmarks, use `./benchmarks/run-h48-benchmarks.sh`. 231 * All the measurements above exclude the time needed to load the pruning 232 tables into memory, which can be quite significant for large tables. 233 * The measurements also excluded the one-off computation of the pruning 234 tables which, for reasons related to the cube coordinates used, is 235 significantly slower for H48 compared to vcube. 236 * H48's and vcube's approach to multithreading are extremely different: 237 H48 parallelize the search for each cube individually, vcube solves 238 multiple cubes in parallel by dedicating a single thread to each of 239 them. Both apporaches have pros and cons: vcube's approch has less 240 overhead in coordination between the threads, but often some threads 241 may be left without work when there are no more cubes left to solve. 242 * Per-cube parallelization means that H48 will always be faster than 243 vcube when solving a single cube. 244 * vcube only supports x86 processors (Intel, AMD), while H48 runs on any 245 architecture, including e.g. ARM (Macbook M series, android phones) 246 and can be compiled to WebAssembly as well. 247 * For H48, both GCC and Clang have been tried, with the same options; 248 the resulting executable was about 10% faster with GCC compared to Clang. 249 vcube only supports compiling with Clang. 250 * The performance of the H48 solver depends slightly, but measurably, on the 251 alignment of the pruning table in memory. This is not handled by the core 252 library, but by the program that uses it. For these tests, we have used as 253 reference implementation the program in `tools/301_solve_file`, which 254 ensures 64-byte alignment.