benchmarks.md (12401B)
1 # Benchmarks for the H48 optimal solver and comparison with vcube 2 3 This page contains some benchmarks for some of the possible 4 configurations of the H48 optimal solver. For comparison we also 5 include similar measurements for Andrew Skalski's impressive optimal 6 solver [vcube](https://github.com/Voltara/vcube), which has been taken 7 as a benchmark reference throughout the development of H48. The two 8 solvers differ both in method (although they both use an A* search) 9 and implementation. 10 11 Similar benchmarks can be found: 12 13 * In Chen Shuang's 14 [dedicated github page](https://github.com/cs0x7f/cube_solver_test/wiki), 15 which includes many more solvers. At the time of writing, the version 16 of H48 used for these benchmarks is a rather old one. Note that 17 cube48opt is a re-implementation of the H48 solver by Chen Shuang. 18 * In Enrico Tenuti's [thesis](https://github.com/enricotenuti/h48thesis), 19 also using an old implementation of H48, but including more data and 20 nice plots. 21 22 ## Setting 23 24 *All movecounts are in Half Turn Metric* 25 26 For the benchmarks we used some sets of random positions that can be found 27 in `benchmarks/scrambles`. We divide them by optimal solution length, 28 because the time to solve a scramble grows exponentially on the number of 29 moves, so mixing up positions with long and short solutions is going to 30 make the shorter solution almost irrelevant. Even within the same solution 31 length the time to solve a random position can vary significantly, 32 so we have used sets of 25 scrambles and taken the average of those, 33 except for the [Superflip](https://en.wikipedia.org/wiki/Superflip), 34 which is a single scramble. 35 36 For short solutions, the very low numbers we get with the largest solver 37 may be particularly inaccurate, because writing the solution (together 38 with some log messages) to standard output may take a significant portion 39 of the time, depending on the terminal emulator used and other factors. 40 41 The main test we performed was finding a single optimal solution, and we 42 compared the results with vcube. This test was run in a single-thread 43 configuration and in two multithread configurations (with 4 and 16 44 threads). We also ran a test on finding *all* optimal solutions which, 45 as far as I know, is a use case not supported by vcube; this latter test 46 was only run on 16 threads. 47 48 Since the size of the pruning table used by the solver is of utmost 49 importance, we include two statistics: time per cube and time per cube 50 *adjusted by table size*. The adjustment we took simply consists in 51 multiplying the time per cube by the size of the table. Empirically 52 this leads to more even results across the board, as the speed of the 53 solvers of the same family seems to scale linearly with the size of the 54 pruning table. 55 56 All benchmark tests were done on the following configuration: 57 58 * CPU: AMD Ryzen 7 7700 (8 cores, 16 virtual threads) 59 * Memory: 2x Corsair Vengeance 32GB 5600MHz 60 * Motherboard: Gigabyte B650M K 61 * Operating system: Debian 12 (Linux kernel 6.1.0) 62 * Compiler: GCC 12.2.0 for H48 and Clang 14.0.6 for vcube 63 64 ## Single solution 65 66 Average time for finding a single optimal solution. 67 68 ### Single thread 69 70 Time per cube (in seconds, lower is better). 71 72 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 73 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 74 |H48 h11 k2|56.1GiB | 0.23 | 1.15 | 5.08 | 31.30 | 278.47 | 75 |vcube 404 |31.8GiB | 0.30 | 1.25 | 6.87 | 57.49 | 291.31 | 76 |H48 h10 k2|28.1GiB | 0.34 | 1.80 | 7.77 | | | 77 |vcube 308 |21.2GiB | 0.20 | 1.11 | 6.92 | | | 78 |H48 h9 k2 |14.1GiB | 0.42 | 2.84 | 12.86 | | | 79 |vcube 208 | 7.3GiB | 0.57 | 4.41 | 20.75 | | | 80 |H48 h8 k2 | 7.1GiB | 0.86 | 6.66 | 27.40 | | | 81 |H48 h7 k2 | 3.6GiB | 1.47 | 8.90 | 42.46 | | | 82 |vcube 112 | 2.4GiB | 1.01 | 9.39 | | | | 83 |H48 h6 k2 | 1.8GiB | 2.28 | 16.89 | | | | 84 85 Time per cube adjusted for tables size (in seconds \* GiB, lower is better). 86 87 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 88 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 89 |H48 h11 k2|56.1GiB | 12.90 | 64.51 | 284.99 |1755.93 |15622.17 | 90 |vcube 404 |31.8GiB | 9.54 | 39.75 | 218.47 |1828.18 | 9263.66 | 91 |H48 h10 k2|28.1GiB | 9.55 | 50.58 | 218.34 | | | 92 |vcube 308 |21.2GiB | 4.24 | 23.53 | 146.70 | | | 93 |H48 h9 k2 |14.1GiB | 5.92 | 40.04 | 181.33 | | | 94 |vcube 208 | 7.3GiB | 4.16 | 32.19 | 151.48 | | | 95 |H48 h8 k2 | 7.1GiB | 6.11 | 47.29 | 194.54 | | | 96 |H48 h7 k2 | 3.6GiB | 5.29 | 32.04 | 152.86 | | | 97 |vcube 112 | 2.4GiB | 2.42 | 22.53 | | | | 98 |H48 h6 k2 | 1.8GiB | 4.10 | 30.40 | | | | 99 100 <details><summary>Plots</summary> 101 <img src="img/17moves1thread.png"> 102 <img src="img/18moves1thread.png"> 103 <img src="img/19moves1thread.png"> 104 <img src="img/20moves1thread.png"> 105 </details> 106 107 ### Multithread (4 threads) 108 109 Time per cube (in seconds, lower is better). 110 111 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 112 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 113 |H48 h11 k2|56.1GiB | 0.06 | 0.31 | 1.31 | 7.96 | 77.81 | 114 |vcube 404 |31.8GiB | 0.10 | 0.38 | 1.88 | 16.98 | (a) | 115 |H48 h10 k2|28.1GiB | 0.10 | 0.47 | 2.00 | 13.54 | 114.35 | 116 |vcube 308 |21.2GiB | 0.06 | 0.42 | 1.95 | 17.73 | (a) | 117 |H48 h9 k2 |14.1GiB | 0.14 | 0.83 | 3.82 | 25.98 | 162.72 | 118 |vcube 208 | 7.3GiB | 0.17 | 1.49 | 5.88 | | (a) | 119 |H48 h8 k2 | 7.1GiB | 0.27 | 2.02 | 7.94 | | | 120 |H48 h7 k2 | 3.6GiB | 0.35 | 2.59 | 12.41 | | | 121 |vcube 112 | 2.4GiB | 0.29 | 3.15 | 12.06 | | (a) | 122 |H48 h6 k2 | 1.8GiB | 0.65 | 4.79 | 23.91 | | | 123 124 Time per cube adjusted for tables size (in seconds \* GiB, lower is better). 125 126 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 127 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 128 |H48 h11 k2|56.1GiB | 3.37 | 17.39 | 73.49 | 446.56 | 4365.92 | 129 |vcube 404 |31.8GiB | 3.80 | 12.08 | 59.78 | 539.96 | (a) | 130 |H48 h10 k2|28.1GiB | 2.81 | 13.21 | 56.20 | 380.47 | 3213.24 | 131 |vcube 308 |21.2GiB | 1.27 | 8.90 | 41.34 | 375.88 | (a) | 132 |H48 h9 k2 |14.1GiB | 1.97 | 11.70 | 53.86 | 366.32 | 2294.35 | 133 |vcube 208 | 7.3GiB | 1.24 | 10.88 | 42.92 | | (a) | 134 |H48 h8 k2 | 7.1GiB | 1.92 | 14.34 | 56.37 | | | 135 |H48 h7 k2 | 3.6GiB | 1.26 | 9.32 | 44.68 | | | 136 |vcube 112 | 2.4GiB | 0.70 | 7.56 | 28.94 | | (a) | 137 |H48 h6 k2 | 1.8GiB | 1.17 | 8.62 | 43.04 | | | 138 139 <details><summary>Plots</summary> 140 <img src="img/17moves4threads.png"> 141 <img src="img/18moves4threads.png"> 142 <img src="img/19moves4threads.png"> 143 <img src="img/20moves4threads.png"> 144 </details> 145 146 (a) vcube cannot parallelize on a single scramble, the results for the 147 Superflip are going to be the same as in the single thread case. 148 149 ### Multithread (16 threads) 150 151 Time per cube (in seconds, lower is better). 152 153 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 154 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 155 |H48 h11 k2|56.1GiB | 0.02 | 0.10 | 0.43 | 2.48 | 26.29 | 156 |vcube 404 |31.8GiB | 0.03 | 0.16 | 0.67 | 6.36 | (a) | 157 |H48 h10 k2|28.1GiB | 0.03 | 0.16 | 0.74 | 4.43 | | 158 |vcube 308 |21.2GiB | 0.04 | 0.22 | 0.89 | 9.53 | (a) | 159 |H48 h9 k2 |14.1GiB | 0.04 | 0.26 | 1.18 | 8.31 | | 160 |vcube 208 | 7.3GiB | 0.08 | 0.80 | 2.38 | | (a) | 161 |H48 h8 k2 | 7.1GiB | 0.08 | 0.60 | 2.48 | | | 162 |H48 h7 k2 | 3.6GiB | 0.11 | 0.81 | 3.91 | | | 163 |vcube 112 | 2.4GiB | 0.15 | 1.66 | 5.18 | | (a) | 164 |H48 h6 k2 | 1.8GiB | 0.21 | 1.53 | 7.82 | | | 165 166 Time per cube adjusted for tables size (in seconds \* GiB, lower is better). 167 168 | Solver | Size |17 moves|18 moves|19 moves|20 moves|Superflip| 169 |:---------|:-------|-------:|-------:|-------:|-------:|--------:| 170 |H48 h11 k2|56.1GiB | 1.12 | 5.61 | 24.12 | 139.13 | 1474.87 | 171 |vcube 404 |31.8GiB | 1.08 | 5.09 | 21.31 | 202.25 | (a) | 172 |H48 h10 k2|28.1GiB | 0.84 | 4.50 | 20.79 | 124.48 | | 173 |vcube 308 |21.2GiB | 0.85 | 4.66 | 18.87 | 202.04 | (a) | 174 |H48 h9 k2 |14.1GiB | 0.56 | 3.67 | 16.64 | 117.17 | | 175 |vcube 208 | 7.3GiB | 0.58 | 5.84 | 17.37 | | (a) | 176 |H48 h8 k2 | 7.1GiB | 0.57 | 4.26 | 17.60 | | | 177 |H48 h7 k2 | 3.6GiB | 0.40 | 2.92 | 14.07 | | | 178 |vcube 112 | 2.4GiB | 0.36 | 3.98 | 12.43 | | (a) | 179 |H48 h6 k2 | 1.8GiB | 0.38 | 2.75 | 14.08 | | | 180 181 <details><summary>Plots</summary> 182 <img src="img/17moves16threads.png"> 183 <img src="img/18moves16threads.png"> 184 <img src="img/19moves16threads.png"> 185 <img src="img/20moves16threads.png"> 186 </details> 187 188 (a) vcube cannot parallelize on a single scramble, the results for the 189 Superflip are going to be the same as in the single thread case. 190 191 ## All optimal solutions 192 193 Average time for finding all optimal solutions. 194 195 ### Multithread (16 threads) 196 197 Time per cube (in seconds, lower is better). 198 199 | Solver | Size |17 moves|18 moves|19 moves|20 moves| 200 |:---------|:-------|-------:|-------:|-------:|-------:| 201 |H48 h11 k2|56.1GiB | 0.05 | 0.50 | 4.24 | 19.75 | 202 |H48 h10 k2|28.1GiB | 0.08 | 0.88 | 6.94 | | 203 |H48 h9 k2 |14.1GiB | 0.13 | 1.39 | 13.50 | | 204 |H48 h8 k2 | 7.1GiB | 0.25 | 2.85 | | | 205 |H48 h7 k2 | 3.6GiB | 0.36 | 4.24 | | | 206 |H48 h6 k2 | 1.8GiB | 0.69 | 8.20 | | | 207 208 Time per cube adjusted for tables size (in seconds \* GiB, lower is better). 209 210 | Solver | Size |17 moves|18 moves|19 moves|20 moves| 211 |:---------|:-------|-------:|-------:|-------:|-------:| 212 |H48 h11 k2|56.1GiB | 2.81 | 28.05 | 237.86 |1107.98 | 213 |H48 h10 k2|28.1GiB | 2.25 | 24.73 | 195.01 | | 214 |H48 h9 k2 |14.1GiB | 1.83 | 19.60 | 190.35 | | 215 |H48 h8 k2 | 7.1GiB | 1.77 | 20.24 | | | 216 |H48 h7 k2 | 3.6GiB | 1.30 | 15.26 | | | 217 |H48 h6 k2 | 1.8GiB | 1.24 | 14.76 | | | 218 219 ## Comments on the results 220 221 * Adjusting for table size, vcube generally is a bit faster than H48, 222 except for 20 moves scrambles where H48 is a clear winner. 223 * On sets of 25 scrambles, H48 performs better than vcube when using 224 multiple threads. However, this advantage will likely disappear (or 225 even invert) if we increase the size of the set. 226 227 ## Other notes 228 229 * Missing values in the table mean that the test is very slow and I did 230 not want to wait for it to finish. I may add these values in the future. 231 * All the measurements above exclude the time needed to load the pruning 232 tables into memory, which can be quite significant for large tables. To 233 repeat these measurements, one can use the tool `301_solve_file`. For 234 example: 235 236 ``` 237 ./build tool solve_file h48h7k2 ./benchmarks/scrambles/scrambles-16.txt 238 ``` 239 240 To find all solutions, add something like `99999 0` at the end of the 241 command. This will tell the tool to find up to `99999` solutions that 242 are at most `0` moves longer than optimal. 243 * The measurements also excluded the one-off computation of the pruning 244 tables which, for reasons related to the cube coordinates used, is 245 significantly slower for H48 compared to vcube. 246 * H48's and vcube's approach to multithreading are extremely different: 247 H48 parallelize the search for each cube individually, vcube solves 248 multiple cubes in parallel by dedicating a single thread to each of 249 them. Both apporaches have pros and cons: vcube's approch has less 250 overhead in coordination between the threads, but often some threads 251 may be left without work when there are no more cubes left to solve. 252 * Per-cube parallelization means that H48 will always be faster than 253 vcube when solving a single cube. 254 * vcube only supports x86 processors (Intel, AMD), while H48 runs on any 255 architecture, including e.g. ARM (Macbook M series, android phones) 256 and can be compiled to WebAssembly as well.