benchmarks.md (11969B)
1 # Benchmarks for the H48 optimal solver and comparison with vcube 2 3 This page contains some benchmarks for some of the possible 4 configurations of the H48 optimal solver. For comparison we also 5 include similar measurements for Andrew Skalski's impressive optimal 6 solver [vcube](https://github.com/Voltara/vcube), which has been taken 7 as a benchmark reference throughout the development of H48. The two 8 solvers differ both in method (although they both use an A* search) 9 and implementation. 10 11 Similar benchmarks can be found: 12 13 * In Chen Shuang's 14 [dedicated github page](https://github.com/cs0x7f/cube_solver_test/wiki), 15 which includes many more solvers. At the time of writing, the version 16 of H48 used for these benchmarks is a rather old one. Note that 17 cube48opt is a re-implementation of the H48 solver by Chen Shuang. 18 * In Enrico Tenuti's [thesis](https://github.com/enricotenuti/h48thesis), 19 also using an old implementation of H48, but including more data and 20 nice plots. 21 22 ## Setting 23 24 *All movecounts are in Half Turn Metric* 25 26 For the benchmarks we used some sets of random positions that can be found 27 in `benchmarks/scrambles`. We divide them by optimal solution length, 28 because the time to solve a scramble grows exponentially on the number of 29 moves, so mixing up positions with long and short solutions is going to 30 make the shorter solution almost irrelevant. Even within the same solution 31 length the time to solve a random position can vary significantly, 32 so we have used sets of 25 scrambles and taken the average of those, 33 except for the [Superflip](https://en.wikipedia.org/wiki/Superflip), 34 which is a single scramble. 35 36 For short solutions, the very low numbers we get with the largest solver 37 may be particularly inaccurate, because writing the solution (together 38 with some log messages) to standard output may take a significant portion 39 of the time, depending on the terminal emulator used and other factors. 40 41 All benchmark tests were done on the following configuration: 42 43 * CPU: AMD Ryzen 7 7700 (8 cores, 16 virtual threads) 44 * Memory: 2x Corsair Vengeance 32GB 5600MHz 45 * Motherboard: Gigabyte B650M K 46 * Operating system: Debian 13 (Linux kernel 6.12.57) 47 * Compiler: GCC 14.2.0 for H48 and Clang 19.1.7 for vcube 48 49 ## Benchmark results 50 51 ### Single solution, single thread 52 53 The first test we performed was finding a single optimal solution, and 54 we compared the results with vcube. 55 56 This test was run in a single-thread configuration, because the two 57 solvers use very different strategies for multithreading: vcube can only 58 parallelize by solving one cube per thread, while H48 is able to take 59 advantage of multiple threads even when solving a single cube. Therefore 60 we prefer to compare their single-threaded performance only. 61 62 Since the size of the pruning table used by the solver is of utmost 63 importance, we include two statistics: time per cube and time per cube 64 *adjusted by table size*. The adjustment we took simply consists in 65 multiplying the time per cube by the size of the table. Empirically this 66 leads to more even results across the board, although the speed of the 67 solvers of the same family does not scale exactly linearly with the size 68 of the pruning table. 69 70 <!-- The following details block can be found in benchmarks/single_thread_comparison.md --> 71 <details><summary>Results: Single solution, single thread</summary> 72 73 Time per cube (in seconds, lower is better). 74 75 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 76 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 77 |vcube 212 |58.2 GiB| 0.11| 0.75| 3.43| 27.28| 19.30| 78 |H48 h11 |56.5 GiB| 0.09| 0.50| 2.24| 15.73| 15.55| 79 |vcube 404 |31.8 GiB| 0.23| 1.24| 6.10| 59.33| 268.26| 80 |H48 h10 |28.3 GiB| 0.15| 0.76| 3.36| 23.51| 38.05| 81 |vcube 308 |21.2 GiB| 0.17| 1.02| 6.20| 58.70| 604.35| 82 |H48 h9 |14.1 GiB| 0.24| 1.48| 6.69| | | 83 |vcube 208 | 7.3 GiB| 0.56| 4.36| 20.58| | | 84 |H48 h8 | 7.1 GiB| 0.46| 3.36| 14.13| | | 85 |H48 h7 | 3.5 GiB| 0.63| 4.85| 23.25| | | 86 |vcube 112 | 2.4 GiB| 0.96| 9.29| 40.52| | | 87 |H48 h6 | 1.8 GiB| 1.25| 9.45| 46.31| | | 88 89 Time per cube adjusted for table size (in seconds \* GiB, lower is better). 90 91 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 92 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 93 |vcube 212 |58.2 GiB| 6.43| 43.80| 199.38| 1587.43| 1122.73| 94 |H48 h11 |56.5 GiB| 5.21| 28.28| 126.68| 889.01| 878.77| 95 |vcube 404 |31.8 GiB| 7.40| 39.47| 194.01| 1887.94| 8535.87| 96 |H48 h10 |28.3 GiB| 4.22| 21.37| 94.86| 664.22| 1075.08| 97 |vcube 308 |21.2 GiB| 3.51| 21.71| 131.50| 1245.26| 12819.94| 98 |H48 h9 |14.1 GiB| 3.34| 20.97| 94.57| | | 99 |vcube 208 | 7.3 GiB| 4.08| 31.74| 149.68| | | 100 |H48 h8 | 7.1 GiB| 3.25| 23.75| 99.82| | | 101 |H48 h7 | 3.5 GiB| 2.22| 17.12| 82.14| | | 102 |vcube 112 | 2.4 GiB| 2.33| 22.53| 98.23| | | 103 |H48 h6 | 1.8 GiB| 2.21| 16.70| 81.86| | | 104 105 <img src="img/17moves1thread.png"> 106 <img src="img/18moves1thread.png"> 107 <img src="img/19moves1thread.png"> 108 <img src="img/20moves1thread.png"> 109 </details> 110 111 As we can see, adjusting for table size, H48 is generally faster than 112 vcube. The gap between the two solvers is larger for scrambles with 113 longer optimal solutions. 114 115 ### Single solution, multiple threads 116 117 The same benchmark as before is repeated using 4 and 16 threads (recall 118 that the CPU used for these tests has 8 physical cores and 16 virtual 119 threads). 120 121 As mentioned above, we don't compare these results to vcube. Instead, 122 we compare them with the single-threaded results for H48 and we show 123 how far the speedup factor is from a theoretically optimal 4x and 16x. 124 125 <!-- The following details block can be found in benchmarks/tables_4_threads.md --> 126 <details><summary>Results: Single solution, 4 threads</summary> 127 128 Time per cube (in seconds, lower is better). 129 130 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 131 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 132 |H48 h11 |56.5 GiB| 0.04| 0.14| 0.60| 3.97| 4.19| 133 |H48 h10 |28.3 GiB| 0.05| 0.20| 0.91| 6.65| 9.81| 134 |H48 h9 |14.1 GiB| 0.07| 0.39| 1.74| 12.54| 18.96| 135 |H48 h8 | 7.1 GiB| 0.13| 0.90| 3.71| | | 136 |H48 h7 | 3.5 GiB| 0.17| 1.28| 6.12| | | 137 |H48 h6 | 1.8 GiB| 0.33| 2.47| 12.22| | | 138 139 Speed-up factor (higher is better). 140 141 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 142 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 143 |H48 h11 |56.5 GiB| 2.55| 3.46| 3.75| 3.96| 3.71| 144 |H48 h10 |28.3 GiB| 3.09| 3.71| 3.67| 3.54| 3.88| 145 |H48 h9 |14.1 GiB| 3.25| 3.84| 3.85| | | 146 |H48 h8 | 7.1 GiB| 3.53| 3.72| 3.81| | | 147 |H48 h7 | 3.5 GiB| 3.60| 3.78| 3.80| | | 148 |H48 h6 | 1.8 GiB| 3.77| 3.82| 3.79| | | 149 150 <img src="img/4threadsspeedupfactor.png"> 151 152 </details> 153 154 <!-- The following details block can be found in benchmarks/tables_16_threads.md --> 155 <details><summary>Results: Single solution, 16 threads</summary> 156 157 Time per cube (in seconds, lower is better). 158 159 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 160 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 161 |H48 h11 |56.5 GiB| 0.02| 0.06| 0.22| 1.33| 1.84| 162 |H48 h10 |28.3 GiB| 0.03| 0.08| 0.33| 2.34| 4.18| 163 |H48 h9 |14.1 GiB| 0.04| 0.15| 0.64| 4.45| 8.09| 164 |H48 h8 | 7.1 GiB| 0.06| 0.34| 1.36| | | 165 |H48 h7 | 3.5 GiB| 0.07| 0.47| 2.20| | | 166 |H48 h6 | 1.8 GiB| 0.13| 0.91| 4.39| | | 167 168 Speed-up factor (higher is better). 169 170 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 171 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 172 |H48 h11 |56.5 GiB| 3.90| 8.14| 10.39| 11.86| 8.45| 173 |H48 h10 |28.3 GiB| 5.72| 9.00| 10.03| 10.04| 9.09| 174 |H48 h9 |14.1 GiB| 6.56| 9.75| 10.53| | | 175 |H48 h8 | 7.1 GiB| 8.08| 9.91| 10.40| | | 176 |H48 h7 | 3.5 GiB| 8.59| 10.30| 10.57| | | 177 |H48 h6 | 1.8 GiB| 9.60| 10.39| 10.54| | | 178 179 <img src="img/16threadsspeedupfactor.png"> 180 181 </details> 182 183 We can see that H48 scales pretty well with 4 threads, getting close 184 to the 4x theoretical maximum speedup in slower cases (small table or 185 long solutions). 186 187 In the 16 threads benchmark shows that, although the virtual threads 188 help push us beyond the 8x theoretical speedup that would be provided 189 by the 8 cores, we are nowhere near a 16x speedup. 190 191 ### All solutions 192 193 Finally, we ran a test on finding *all* optimal solutions which, as 194 far as I know, is a use case not supported by vcube. For convenience, 195 this test is only run on 16 threads. 196 197 <!-- The following details block can be found in benchmarks/all_solutions.md --> 198 <details><summary>Results: All solutions (16 threads)</summary> 199 200 Time per cube (in seconds, lower is better). 201 202 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 203 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 204 |H48 h11 |56.5 GiB| 0.04| 0.26| 2.22| 18.96| 16.95| 205 |H48 h10 |28.3 GiB| 0.05| 0.42| 3.82| 34.42| 36.80| 206 |H48 h9 |14.1 GiB| 0.08| 0.73| 7.28| | | 207 |H48 h8 | 7.1 GiB| 0.15| 1.56| 15.41| | | 208 |H48 h7 | 3.5 GiB| 0.21| 2.38| 26.52| | | 209 |H48 h6 | 1.8 GiB| 0.39| 4.67| 53.00| | | 210 211 Time per cube adjusted for table size (in seconds \* GiB, lower is better). 212 213 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 214 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 215 |H48 h11 |56.5 GiB| 2.11| 14.55| 125.21| 1071.23| 957.47| 216 |H48 h10 |28.3 GiB| 1.48| 11.96| 107.83| 972.35| 1039.73| 217 |H48 h9 |14.1 GiB| 1.08| 10.29| 102.88| | | 218 |H48 h8 | 7.1 GiB| 1.03| 10.99| 108.87| | | 219 |H48 h7 | 3.5 GiB| 0.74| 8.41| 93.69| | | 220 |H48 h6 | 1.8 GiB| 0.70| 8.25| 93.68| | | 221 222 </details> 223 224 ## Other notes 225 226 * To repeat the benchmarks, use `./benchmarks/run-h48-benchmarks.sh`. 227 * All the measurements above exclude the time needed to load the pruning 228 tables into memory, which can be quite significant for large tables. 229 * The measurements also excluded the one-off computation of the pruning 230 tables which, for reasons related to the cube coordinates used, is 231 significantly slower for H48 compared to vcube. 232 * H48's and vcube's approach to multithreading are extremely different: 233 H48 parallelize the search for each cube individually, vcube solves 234 multiple cubes in parallel by dedicating a single thread to each of 235 them. Both apporaches have pros and cons: vcube's approch has less 236 overhead in coordination between the threads, but often some threads 237 may be left without work when there are no more cubes left to solve. 238 * Per-cube parallelization means that H48 will always be faster than 239 vcube when solving a single cube. 240 * vcube only supports x86 processors (Intel, AMD), while H48 runs on any 241 architecture, including e.g. ARM (Macbook M series, android phones) 242 and can be compiled to WebAssembly as well. 243 * For H48, both GCC and Clang have been tried, with the same options; 244 the resulting executable was about 10% faster with GCC compared to Clang. 245 vcube only supports compiling with Clang.