h48.md - nissy-core - The "engine" of nissy, including the H48 optimal solver.

h48.md (23276B)

1 # The H48 optimal solver
2
3 This document contains information on the H48 Rubik's Cube optimal solver.
4 The implementation of the solver is still in progress. This document
5 partly describes ideas that have not been implemented yet, and it will
6 be updated to reflect the actual implementation.
7
8 I highly encourage the reader to check out Jaap Scherphuis'
9 [Computer Puzzling page](https://www.jaapsch.net/puzzles/compcube.htm)
10 before reading this document.
11
12 Other backround material that will be referenced throughout the text includes:
13
14 * [Cube coordinates](https://sebastiano.tronto.net/speedcubing/coordinates)
15 by Sebastiano Tronto
16 * [Nxopt](https://github.com/rokicki/cube20src/blob/master/nxopt.md)
17 by Tomas Rokicki
18 * [The Mathematics behind Cube Explorer](https://kociemba.org/cube.htm)
19 by Herber Kociemba
20
21 Initially I intended to give minimum information here, citing the
22 above resources when needed, but I ended up rewriting even some of the
23 basic things in this document.
24
25 ## Optimal solvers
26
27 ### IDA*
28
29 The basic idea behind the solver is an iterative-deepening
30 depth-first search with pruning tables, also known as IDA*. You
31 can find a detailed explanation of this idea in
32 [Jaap's page](https://www.jaapsch.net/puzzles/compcube.htm#tree).
33
34 To summarize, the IDA* algorithm finds the shortest solution(s) for a
35 scrambled puzzle by using multiple depth-first searches at increasing
36 depth. A breadth-first search is not applicable in this scenario because
37 of the large amount of memory required. At each depth, the puzzle's state
38 is evaluated to obtain an estimated lower bound for how many moves are
39 required to solve the puzzle; this is done by employing large pruning
40 tables, as well as other techniques. If the lower bound exceeds the
41 number of moves required to reach the current target depth, the branch is
42 pruned. Otherwise, every possible move is tried, except those that would
43 "cancel" with the preceding moves, obtaining new positions at depth N+1.
44
45 ### Pruning tables and coordinates
46
47 A pruning table associates to a cube position a value that is a
48 lower bound for the number of moves it takes to solve that position.
49 To acces this value, the cube must be turned into an index for the
50 given table. This is done using a
51 [*coordinate*](https://sebastiano.tronto.net/speedcubing/coordinates),
52 by which we mean a function from the set of (admissible, solvable) cube
53 positions to the set of non-negative integers. The maximum value N of a
54 coordinate must be less than the size of the pruning table; preferably
55 there are no "gaps", that is the coordinate reaches all values between
56 0 and N and the pruning table has size N+1.
57
58 Coordinates are often derived from the cosets of a *target subgroup*.
59 As an example, consider the pruning table that has one entry for each
60 possible corner position, and the stored values denote the length of
61 an optimal corner solution for that position. In this situation, the
62 target subgroup is the set of all cube positions with solved corners
63 (for the Mathematician: this is indeed a subgroup of the group of all
64 cube positions). This group consists of `12! * 2^10` positions. The set
65 of cosets of this subgroup can be identified with the set of corner
66 configurations, and it has size `8! * 3^7`. By looking at corners,
67 from each cube position we can compute a number from `0` to `8! * 3^7 - 1`
68 and use it as an index for the pruning table.
69
70 The pruning table can be filled in many ways. A common way is starting
71 with the coordinate of the solved cube (often 0) and visiting the set
72 of all coordinates with a breadth-first search. To do this, one needs
73 to apply a cube move to a coordinate; this can either be done directly
74 (for example using transistion tables, as explained in Jaap's page) or
75 by converting the coordinate back into a valid cube position, applying
76 the move and then computing the coordinate of the new position.
77
78 The largest a pruning table is, the more information it contains,
79 the faster the solvers that uses it is. It is therefore convenient to
80 compress the tables as much as possible. This can be achieved by using
81 4 bits per position to store the lower bound, or as little as 2 bits
82 per position with some caveats (see Jaap's page or Rokicki's nxopt
83 description for two possible ways to achieve this). Tables that use
84 only 1 bit per position are possible, but the only information they
85 can provide is wether the given position requires more or less than a
86 certain number of moves to be solved.
87
88 ### Symmetries
89
90 Another, extremely effective way of reducing the size of a pruning table
91 is using symmetries. In the example above of the corner table, one can
92 notice that all corner positions that are one quarter-turn away from being
93 solved are essentially the same: they can be turned one into the other
94 by rotating the cube and optionally applying a mirroring transformation.
95 Indeed, every position belongs to a set of up to 48 positions that are
96 "essentially the same" in this way. Keeping a separate entry in the
97 pruning table for each of these positions is a waste of memory.
98
99 There are ways to reduce a coordinate by symmetry, grouping every
100 transformation-equivalent position into the same index. You can
101 find a description in my
102 [cube coordinates page](https://sebastiano.tronto.net/speedcubing/coordinates).
103 By doing so, the size of the pruning table is reduced without loss of
104 information.
105
106 Some well-known solvers (Cube Explorer, nxopt) do not take advantage
107 of the full group of 48 symmetries of the cube, but they use only the
108 16 symmetries that fix the UD axis. However, they make up for this by
109 doing 3 pruning table lookups, one for each axis of the cube.
110
111 Some cube positions are self-symmetric: when applying certain
112 transformations to them, they remain the same. For example, the
113 cube obtained by applying U2 to a solved cube is invariant with
114 respect to the mirroring along the RL-plane. This fact has a couple
115 of consequences:
116
117 * Reducing by a group of N symmetries does not reduce the size of
118 the pruning table by a factor of N, but by a slightly smaller
119 factor. If the size of the coordinate is large, this fact is
120 harmless.
121 * When combining symmetry-reduced coordinates with other coordinates,
122 one has to be extra careful with handling self-symmetric coordinates.
123 An informal explanation of this phenomenon can be found in
124 the [docs/nasty-symmetries.md document](./nasty-symmetries.md).
125
126 ## The H48 solver
127
128 ### The target subgroup: H48 coordinates
129
130 The H48 solver uses a target group that is invariant under all 48
131 symmetries. This group is defined as follows:
132
133 * Each corner is placed in the *corner tetrad* it belongs to, and it
134 is oriented. Here by corner tetrad we mean one of the two sets of
135 corners {UFR, UBL, DFL, DBR} and {UFL, UBR, DFR, DBL}. For a corner
136 that is placed in its own tetrad, the orientation does not depend on
137 the reference axis chosen, and an oriented corner can be defined
138 has having the U or D sticker facing U or D.
139 * Each edge is placed in the *slice* it belongs to. The three edge slices
140 are M = {UF, UB, DB, DF}, S = {UR, UL, DL, DR} and E = {FR, FL, BL, BR}.
141
142 Other options are available for corners: for example, we could
143 impose that the corner permutation is even or that corners are in
144 *half-turn reduction* state, that is they are in the group generated
145 by {U2, D2, R2, L2, F2, B2}. We settled on the corner group described
146 above because it is easy to calculate and it gives enough options
147 for pruning table sizes, depending on how edge orientation is
148 considered (see the next section).
149
150 We call the coordinate obtained from the target subgroup described
151 above the **h0** coordinate. This coordinate has `C(8,4) * 3^7 *
152 C(12,8) * C(8,4) = 5.304.568.500` possible values. If the corner
153 part of the coordinate is reduced by symmetry, it consists of only
154 3393 elements, giving a total of `3393 * C(12,8) * C(8,4) =
155 117.567.450` values. This is not very large for a pruning table,
156 as with 2 bits per entry it would occupy less than 30MB. However,
157 it is possible to take edge orientation (partially) into account
158 to produce larger tables.
159
160 ### Edge orientation
161
162 Like for corners, the orientation of an edge is well-defined independently
163 of any axis when said edge is placed in the slice it belongs to. We may
164 modify the target subgroup defined in the previous section by imposing
165 that the edges are all oriented. This yields a new coordinate, which we
166 call **h11**, whose full size with symmetry-reduced corners is `3393 *
167 C(12,8) * C(8,4) * 2^11 = 240.778.137.600`. A pruning table based on
168 this coordinate with 2 bits per entry would occupy around 60GB, which
169 is a little too much for most personal computers.
170
171 One can wonder if it is possible to use a coordinate that considers
172 the orientation of only *some* of the edges, which we may call **h1**
173 to **h10**. Such coordinates do exist, but they are not invariant under
174 the full symmetry group: indeed an edge whose orientation we keep track
175 of could be moved to any of the untracked edges' positions by one of
176 the symmetries, making the whole coordinate ill-defined.
177
178 It is however possible to compute the symmetry-reduced pruning tables
179 for these coordinates. One way to construct them is by taking the **h11**
180 pruning table and "forgetting" about some of the edge orientation values,
181 collapsing 2 (or a power thereof) values to one by taking the minimum.
182 It is also possible to compute these tables directly, as explained in
183 the **Pruning table computation** section below.
184
185 ### Coordinate computation for pruning value estimation
186
187 In order to access the pruning value for a given cube position, we first
188 need to compute its coordinate value. Let's use for simplicity the **h0**
189 coordinate, but everything will be valid for any other coordinate of
190 the type described in the previous section.
191
192 As described in the symmetric-composite coordinate section of
193 [my coordinates page](https://sebastiano.tronto.net/speedcubing/coordinates),
194 we first need to compute the part of the coordinate where the symmetry
195 is applied, that is the corner separation + orientation (also called
196 *cocsep*). The value of the coordinate depends on the equivalence
197 class of the corner configuration, and is memorized in a table called
198 `cocsepdata`. To avoid lengthy computations, the index for a cube position
199 in this table is determined by the corner orientation (between `0` and
200 `3^7-1`) and a binary representation of the corner separation part (an
201 8-bit number with 4 zero digits and 4 one digits, where zeros denote
202 corners in one tetrad and 1 denotes corners in the other; the last bit
203 can be ignored, as it can easily be deduced from the other 7). This is
204 slightly less space-efficient than computing the actual corner-separation
205 coordinate as a value between `0` and `C(8,4)`, but it is much faster.
206
207 We can now access the value in the `cocsepdata` table corresponding to
208 our cube position. This table contains the value of the symmetry-reduced
209 coordinate and the so-called *ttrep* (transformation to representative)
210 necessary to compute the full **h0** coordinate. For convenience,
211 this table also stores a preliminary pruning value based on the corner
212 coordinate. If this value is large enough, we may skip the computation of
213 the **h0** coordinate and any further estimate. These three values can be
214 stored in a single 32 bit integer, so that the table uses less than 12MB.
215
216 ### Getting the pruning value
217
218 Once we have computed the full h0 coordinate, we can access the correct
219 entry in the full pruning table. As mentioned above, the pruning table
220 can be one of three kinds:
221
222 * 4 bits per entry, or `k4`: In this case the pruning value (between 0
223 and 15) can be simply read off the table.
224 * 2 bits per entry, or `k2`: Tables of this kind work as described by
225 Rokicki in the
226 [nxopt document](https://github.com/rokicki/cube20src/blob/master/nxopt.md).
227 In this case the pruning table also has a *base value*, that determines
228 the offset to be added to each entry (each entry can only be 0, 1, 2 or 3).
229 If the base value is `b`, a pruning value of 1, 2 or 3 can be used directly
230 as a lower bound of b+1, b+2 and b+3 respectively. However, a value of 0
231 could mean that the actual lower bound is anything between 0 and b, so we
232 cannot take b as a lower bound. Instead we have to use a pruning value from
233 another table - see the section "Fallback tables" below.
234 * 1 bit per entry, or `k1`: With one bit per entry, the only information we
235 can get from the pruning table is wether or not the current position
236 requires more or fewer moves than a fixed base value b. This can still be
237 valuable if most positions are more or less equally split between two
238 pruning values.
239 (Work in progress - `k1` tables not available in the code yet)
240
241 ### Fallback tables
242
243 When a pruning table does not store the exact lower bound value, for
244 example in the case of `k2` table as described above, we need to refine
245 our estimate using another table, which we call *fallback table*.
246 In the current implementation, we actually use two different tables:
247
248 * **h0** table: for larger `k2` tables we get a fallback value from the full
249 table for the **h0** coordinate.
250 * edges-only table: to improve the pruning value for certain specific
251 scrambles, namely whose with solved corners such as the superflip,
252 we employ a second table that takes into account the edges of a full
253 **h11** coordinate. This table is small (around 1MB).
254
255 More tables could be used to refine the fallback estimate, but each
256 additional table leads to longer lookup times, especially if it is
257 too large to fit in cache.
258
259 ### Estimation refinements
260
261 After computing the pruning value, there are a number of different tricks
262 that can be used to improve the estimation.
263
264 #### Inverse estimate
265
266 A cube position and its inverse will, in general, give a different
267 pruning value. If the normal estimate is not enough to prune the branch,
268 the inverse of the position can be computed and a new estimate can be
269 obtained from there.
270
271 #### Reducing the branching factor - tight bounds
272
273 *This trick and the next are and adaptation of a similar technique
274 introduced by Rokicki in nxopt.*
275
276 We can take advantage of the fact that the **h0** (and **h11**) coordinate
277 is invariant under the subgroup `<U2, D2, R2, L2, F2, B2>`. Suppose we
278 are looking for a solution of length `m`, we are at depth `d` and the
279 pruning value `p` *for the inverse position* is a strict bound, that is
280 `d+p = m`. In this situation, from the inverse point of view the solution
281 cannot end with a 180° move: if this was the case, since these moves are
282 contained in the target group for **h0** (or **h11**), it must be that
283 we were in the target subgroup as well *before the last move*, i.e. we
284 have found a solution for the **h0** coordinate of length `p-1`. But this
285 is in contradiction with the fact that the inverse pruning value is `p`.
286
287 From all of this, we conclude that trying any 180° move as next move is
288 useless (because it would be the last move from the inverse position).
289 We can therefore reduce the branching factor significantly and only try
290 quarter-turn moves.
291
292 #### Reducing the branching factor - switching to inverse
293
294 We can expand on the previous trick by using a technique similar to NISS.
295
296 Suppose that we have a strict bound for the *normal position*. As above,
297 we can deduce that the last move cannot be a 180° move, but this does not
298 tell us anything about the possibilities for the next move. However, if
299 we *switch to the inverse scramble* we can take advantage of this fact
300 as described above. For doing this, we need to replace the cube with its
301 inverse, and keep track of the moves done from now on so that we can
302 invert them at the end to construct the final solution.
303
304 When this technique is used, it is also possible to avoid some table
305 lookups: if the last move applied to the cube is a 180° move *on the
306 inverse position*, then the coordinate on the normal position has not
307 changed. Thus if we keep track of the last computed pruning value,
308 we can reuse it and avoid an expensive table lookup.
309
310 ### Other optimizations
311
312 The H48 solver uses various other optimizations.
313
314 #### Avoiding inverse cube computation
315
316 Computing the inverse of a cube position is expensive. We avoid doing
317 that (for the inverse pruning value estimate) if we bring along both
318 the normal and the inverse cube during the search, and we use *premoves*
319 to apply moves to the inverse scramble.
320
321 #### Multi-threading
322
323 The solution search is parallelized *per-scramble*. Although when solving
324 multiple positions it would be more efficient to solve them in parallel
325 by dedicating a single thread to each of them, the current H48 solver
326 optimizes for solving a single cube at the time.
327
328 To do this, we first compute all cube positions that are 4 moves away
329 from the scramble. This way we prepare up to 43254 *tasks* (depending
330 on the symmetries of the starting position, which we take advantage of)
331 which are equally split between the available threads. Any solution
332 encountered in this step is of course added to the list of solutions.
333
334 #### Heuristically sorting tasks
335
336 The tasks described in the previous paragraph (multi-threading) are
337 initially searched in an arbitrary order. However, after searching
338 at a sufficient depth, we have gathered some data that allows us to
339 make some heuristical improvements: the tasks that leads to visiting
340 more positions (or in other words, where we go over the estimated lower
341 bounds less often), are more likely to yield the optimal solution. Thus
342 we sort the tasks based on this, adjusting by a small factor due to the
343 fact that sequences ending in U, R or F moves have more continuations
344 than those ending in D, L or B moves - as we don't allow, for example,
345 both U D and D U, but only the former.
346
347 Preliminary benchmark show a performance improvement of around 40%
348 when searching a single solution. When searching for multiple optimal
349 solutions the effects of this optimization will be less pronounced, and
350 they are obviously inexistent when looking for *all* optimal solutions.
351
352 ## Pruning table computation
353
354 ### 4 bits tables for h0 and h11
355
356 Computing the pruning table for a "real" h48 coordinate, that **h0**
357 or **h11**, using 4 bits per entry is quite simple. The method currently
358 implemented works as follows.
359
360 First, we set the value of the solved cube's coordinate to 0 and every
361 other entry to 15, the largest 4-bit integer. Then we iteratively scan
362 through the table and compute the coordinates at depth n+1 from those at
363 depth n as follows: for each coordinate at depth n, we compute a valid
364 representative for it, we apply each of the possible 18 moves to it, and
365 we set their value to n+1. Once the table is filled or we have reached
366 depth 15, we stop (there are no **h0** coordinates at depth 16 or more,
367 but I currently don't know if this is the case for **h11**).
368
369 Unfortunately what I described above is an oversimplification: one
370 also must take into account the case where the corner coordinate is
371 self-symmetric, and handle it accordingly by making sure that every
372 symmetric variation of the same coordinate has its value set at the
373 right iteration.
374
375 When this algorithm reaches the last few iterations, the coordinates at
376 depth n are many more than those whose depth is still unknown. Therefore
377 it is convenient to look at the unset coordinate and check if they have
378 any neighbor whose depth is i; if so, we can set such a coordinate to i+1.
379 Thanks to [torchlight](https://github.com/torchlight) for suggesting
380 this optimization.
381
382 Finally, this algorithm can be easily parallelized by dividing the set
383 of coordinates into separate sections, but one must take into account
384 that a coordinate and its neighbors are usually not in the same section.
385
386 This method is currently implemented only for **h0**, since the 4 bits
387 table for **h11** would require around 115GB of RAM to compute, and I
388 don't have this much memory.
389
390 ### 2 bits tables with for h0 and h11
391
392 (Work in progress - currently there is no specialized routine for
393 computing 2 bits table for "real" coordinates; instead, an optimized
394 version of the generic method explained below is used)
395
396 ### A generic method for intermediate coordinates (from h1 to h10)
397
398 (Work in progress - this method is quite slow and it may be replaced
399 by a better algorithm in the future)
400
401 Computing the pruning tables for intermediate coordinates is not as
402 simple. The reason is that these coordinates are not invariant under
403 the 48 transformations of the cube, but we are treating them as such.
404 Take for example the case of **h10**. Each **h10** coordinate is
405 represented by either of two **h11** coordinates: the one where the 11th
406 edge is correctly oriented and the one where this edge is flipped (of
407 course, the orientation of the 12th edge can be deduced from the parity
408 of the orientation of the other 11). We can take any cube whose **h11**
409 coordinate is one of these two to represent our **h10** coordinate, but
410 the set of neighbors will change depending on which representative we
411 picked. In other words: if I take one of the two cubes and make a move,
412 the position reached is not necessarily obtainable by applying a move
413 to the other cube. This means that the same algorithm that we described
414 for the "real coordinate" case cannot be applied here.
415
416 It is still possible to do a brute-force depth-first seach to fill
417 these tables. To make it reasonably fast, we apply a couple of
418 optimizations. First of all, we restrict ourselves to computing a 2 bits
419 table, so we can stop the search early.
420
421 The second optimization we employ consists in pre-computing all possible
422 **h11** coordinates at a fixed depth, and storing their depth in a hash
423 map indexed by their **h11** coordinate value. This way we do not have
424 to brute-force our way through from depth 0, and it make this method
425 considerably faster. Unfortunately, since the number of coordinates
426 at a certain depth increases exponentially with the depth, we are for
427 now limited at storing the coordinates at depth 8. (Work in progress -
428 we may experiment with depth 9 in the future)
429
430 This method works for the "real" coordinates **h0** and **h11** as well.
431 Moreover, in this case one can optimize it further by avoiding to repeat
432 the search from a coordinate that has already been visited. (Work
433 in progress - this method will be replaced in the future by a more
434 efficient one)
435
436 ## Possible future improvements
437
438 *This section should be considered more of a draft with personal
439 notes, unlike the rest of the document.*
440
441 There are some areas where this implementation of the H48 optimal solver
442 can be improved:
443
444 * Interwtining fallback tables to the main table. This is a trick that
445 nxopt uses to reduce the number of cache misses, but we have not
446 implemented in H48 yet.
447 * Faster pruning table generation for **h11** and **h0**. Since these
448 two coordinates are "real" coordinates, we can use a different technique
449 to generate their tables faster. This won't affect the solver speed.
450 * Use
451 [BPMX](https://webdocs.cs.ualberta.ca/~nathanst/papers/AStar_Inconsistent.pdf)
452 to improve pruning estimation. This optimization sped up
453 [vcube](https://github.com/Voltara/vcube/commit/a5b08f51793f81ac34c1d402f2627f6a0495c636).
454 by about 5%-10%. Suggested by Arhan Chaudhary pointed out.

	nissy-core The "engine" of nissy, including the H48 optimal solver.
	git clone https://git.tronto.net/nissy-core
	Download \| Log \| Files \| Refs \| README \| LICENSE