h48.md - h48 - A prototype for an optimal Rubik's cube solver, work in progress.

h48.md (21449B)
      1 # The H48 optimal solver
      2 
      3 This document contains information on the H48 Rubik's Cube optimal solver.
      4 The implementation of the solver is still in progress. This document
      5 partly describes ideas that have not been implemented yet, and it will
      6 be updated to reflect the actual implementation.
      7 
      8 I highly encourage the reader to check out Jaap Scherphuis'
      9 [Computer Puzzling page](https://www.jaapsch.net/puzzles/compcube.htm)
     10 before reading this document.
     11 
     12 Other backround material that will be referenced throughout the text includes:
     13 
     14 * [Cube coordinates](https://sebastiano.tronto.net/speedcubing/coordinates)
     15   by Sebastiano Tronto
     16 * [Nxopt](https://github.com/rokicki/cube20src/blob/master/nxopt.md)
     17   by Tomas Rokicki
     18 * [The Mathematics behind Cube Explorer](https://kociemba.org/cube.htm)
     19   by Herber Kociemba
     20 
     21 Initially I intended to give minimum information here, citing the
     22 above resources when needed, but I ended up rewriting even some of the
     23 basic things in this document.
     24 
     25 ## Optimal solvers
     26 
     27 ### IDA*
     28 
     29 The basic idea behind the solver is an iterative-deepening
     30 depth-first search with pruning tables, also known as IDA*. You
     31 can find a detailed explanation of this idea in
     32 [Jaap's page](https://www.jaapsch.net/puzzles/compcube.htm#tree).
     33 
     34 To summarize, the IDA* algorithm finds the shortest solution(s) for a
     35 scrambled puzzle by using multiple depth-first searches at increasing
     36 depth. A breadth-first search is not applicable in this scenario because
     37 of the large amount of memory required.  At each depth, the puzzle's state
     38 is evaluated to obtain an estimated lower bound for how many moves are
     39 required to solve the puzzle; this is done by employing large pruning
     40 tables, as well as other techniques. If the lower bound exceeds the
     41 number of moves required to reach the current target depth, the branch is
     42 pruned. Otherwise, every possible move is tried, except those that would
     43 "cancel" with the preceding moves, obtaining new positions at depth N+1.
     44 
     45 ### Pruning tables and coordinates
     46 
     47 A pruning table associates to a cube position a value that is a
     48 lower bound for the number of moves it takes to solve that position.
     49 To acces this value, the cube must be turned into an index for the
     50 given table. This is done using a
     51 [*coordinate*](https://sebastiano.tronto.net/speedcubing/coordinates),
     52 by which we mean a function from the set of (admissible, solvable) cube
     53 positions to the set of non-negative integers. The maximum value N of a
     54 coordinate must be less than the size of the pruning table; preferably
     55 there are no "gaps", that is the coordinate reaches all values between
     56 0 and N and the pruning table has size N+1.
     57 
     58 Coordinates are often derived from the cosets of a *target subgroup*.
     59 As an example, consider the pruning table that has one entry for each
     60 possible corner position, and the stored values denote the length of
     61 an optimal corner solution for that position. In this situation, the
     62 target subgroup is the set of all cube positions with solved corners
     63 (for the Mathematician: this is indeed a subgroup of the group of all
     64 cube positions). This group consists of `12! * 2^10` positions.  The set
     65 of cosets of this subgroup can be identified with the set of corner
     66 configurations, and it has size `8! * 3^7`.  By looking at corners,
     67 from each cube position we can compute a number from `0` to `8! * 3^7 - 1`
     68 and use it as an index for the pruning table.
     69 
     70 The pruning table can be filled in many ways. A common way is starting
     71 with the coordinate of the solved cube (often 0) and visiting the set
     72 of all coordinates with a breadth-first search. To do this, one needs
     73 to apply a cube move to a coordinate; this can either be done directly
     74 (for example using transistion tables, as explained in Jaap's page) or
     75 by converting the coordinate back into a valid cube position, applying
     76 the move and then computing the coordinate of the new position.
     77 
     78 The largest a pruning table is, the more information it contains,
     79 the faster the solvers that uses it is. It is therefore convenient to
     80 compress the tables as much as possible. This can be achieved by using
     81 4 bits per position to store the lower bound, or as little as 2 bits
     82 per position with some caveats (see Jaap's page or Rokicki's nxopt
     83 description for two possible ways to achieve this).  Tables that use
     84 only 1 bit per position are possible, but the only information they
     85 can provide is wether the given position requires more or less than a
     86 certain number of moves to be solved.
     87 
     88 ### Symmetries
     89 
     90 Another, extremely effective way of reducing the size of a pruning table
     91 is using symmetries. In the example above of the corner table, one can
     92 notice that all corner positions that are one quarter-turn away from being
     93 solved are essentially the same: they can be turned one into the other
     94 by rotating the cube and optionally applying a mirroring transformation.
     95 Indeed, every position belongs to a set of up to 48 positions that are
     96 "essentially the same" in this way.  Keeping a separate entry in the
     97 pruning table for each of these positions is a waste of memory.
     98 
     99 There are ways to reduce a coordinate by symmetry, grouping every
    100 transformation-equivalent position into the same index. You can
    101 find a description in my
    102 [cube coordinates page](https://sebastiano.tronto.net/speedcubing/coordinates).
    103 By doing so, the size of the pruning table is reduced without loss of
    104 information.
    105 
    106 Some well-known solvers (Cube Explorer, nxopt) do not take advantage
    107 of the full group of 48 symmetries of the cube, but they use only the
    108 16 symmetries that fix the UD axis. However, they make up for this by
    109 doing 3 pruning table lookups, one for each axis of the cube.
    110 
    111 Some cube positions are self-symmetric: when applying certain
    112 transformations to them, they remain the same. For example, the
    113 cube obtained by applying U2 to a solved cube is invariant with
    114 respect to the mirroring along the RL-plane. This fact has a couple
    115 of consequences:
    116 
    117 * Reducing by a group of N symmetries does not reduce the size of
    118   the pruning table by a factor of N, but by a slightly smaller
    119   factor. If the size of the coordinate is large, this fact is
    120   harmless.
    121 * When combining symmetry-reduced coordinates with other coordinates,
    122   one has to be extra careful with handling self-symmetric coordinates.
    123   This is a much more painful point, but a precise description of
    124   the adjustments needed to handle these cases is out of the scope
    125   of this document.
    126 
    127 ## The H48 solver
    128 
    129 ### The target subgroup: H48 coordinates
    130 
    131 The H48 solver uses a target group that is invariant under all 48
    132 symmetries. This group is defined as follows:
    133 
    134 * Each corner is placed in the *corner tetrad* it belongs to, and it
    135   is oriented. Here by corner tetrad we mean one of the two sets of
    136   corners {UFR, UBL, DFL, DBR} and {UFL, UBR, DFR, DBL}. For a corner
    137   that is placed in its own tetrad, the orientation does not depend on
    138   the reference axis chosen, and an oriented corner can be defined
    139   has having the U or D sticker facing U or D.
    140 * Each edge is placed in the *slice* it belongs to. The three edge slices
    141   are M = {UF, UB, DB, DF}, S = {UR, UL, DL, DR} and E = {FR, FL, BL, BR}.
    142 
    143 Other options are available for corners: for example, we could
    144 impose that the corner permutation is even or that corners are in
    145 *half-turn reduction* state, that is they are in the group generated
    146 by {U2, D2, R2, L2, F2, B2}. We settled on the corner group described
    147 above because it is easy to calculate and it gives enough options
    148 for pruning table sizes, depending on how edge orientation is
    149 considered (see the next section).
    150 
    151 We call the coordinate obtained from the target subgroup described
    152 above the **h0** coordinate. This coordinate has `C(8,4) * 3^7 *
    153 C(12,8) * C(8,4) = 5.304.568.500` possible values.  If the corner
    154 part of the coordinate is reduced by symmetry, it consists of only
    155 3393 elements, giving a total of `3393 * C(12,8) * C(8,4) =
    156 117.567.450` values. This is not very large for a pruning table,
    157 as with 2 bits per entry it would occupy less than 30MB. However,
    158 it is possible to take edge orientation (partially) into account
    159 to produce larger tables.
    160 
    161 ### Edge orientation
    162 
    163 Like for corners, the orientation of an edge is well-defined independently
    164 of any axis when said edge is placed in the slice it belongs to. We may
    165 modify the target subgroup defined in the previous section by imposing
    166 that the edges are all oriented. This yields a new coordinate, which we
    167 call **h11**, whose full size with symmetry-reduced corners is `3393 *
    168 C(12,8) * C(8,4) * 2^11 = 240.778.137.600`. A pruning table based on
    169 this coordinate with 2 bits per entry would occupy around 60GB, which
    170 is a little too much for most personal computers.
    171 
    172 One can wonder if it is possible to use a coordinate that considers
    173 the orientation of only *some* of the edges, which we may call **h1**
    174 to **h10**. Such coordinates do exist, but they are not invariant under
    175 the full symmetry group: indeed an edge whose orientation we keep track
    176 of could be moved to any of the untracked edges' positions by one of
    177 the symmetries, making the whole coordinate ill-defined.
    178 
    179 It is however possible to compute the symmetry-reduced pruning tables
    180 for these coordinates. One way to construct them is by taking the **h11**
    181 pruning table and "forgetting" about some of the edge orientation values,
    182 collapsing 2 (or a power thereof) values to one by taking the minimum.
    183 It is also possible to compute these tables directly, as explained in
    184 the **Pruning table computation** section below.
    185 
    186 ### Coordinate computation for pruning value estimation
    187 
    188 In order to access the pruning value for a given cube position, we first
    189 need to compute its coordinate value. Let's use for simplicity the **h0**
    190 coordinate, but everything will be valid for any other coordinate of
    191 the type described in the previous section.
    192 
    193 As described in the symmetric-composite coordinate section of
    194 [my coordinates page](https://sebastiano.tronto.net/speedcubing/coordinates),
    195 we first need to compute the part of the coordinate where the symmetry
    196 is applied, that is the corner separation + orientation (also called
    197 *cocsep*). The value of the coordinate depends on the equivalence
    198 class of the corner configuration, and is memorized in a table called
    199 `cocsepdata`. To avoid lengthy computations, the index for a cube position
    200 in this table is determined by the corner orientation (between `0` and
    201 `3^7-1`) and a binary representation of the corner separation part (an
    202 8-bit number with 4 zero digits and 4 one digits, where zeros denote
    203 corners in one tetrad and 1 denotes corners in the other; the last bit
    204 can be ignored, as it can easily be deduced from the other 7). This is
    205 slightly less space-efficient than computing the actual corner-separation
    206 coordinate as a value between `0` and `C(8,4)`, but it is much faster.
    207 
    208 We can now access the value in the `cocsepdata` table corresponding to
    209 our cube position. This table contains the value of the symmetry-reduced
    210 coordinate and the so-called *ttrep* (transformation to representative)
    211 necessary to compute the full **h0** coordinate. For convenience,
    212 this table also stores a preliminary pruning value based on the corner
    213 coordinate.  If this value is large enough, we may skip the computation of
    214 the **h0** coordinate and any further estimate. These three values can be
    215 stored in a single 32 bit integer, so that the table uses less than 12MB.
    216 
    217 ### Getting the pruning value
    218 
    219 Once we have computed the full h0 coordinate, we can access the correct
    220 entry in the full pruning table. As mentioned above, the pruning table
    221 can be one of three kinds:
    222 
    223 * 4 bits per entry, or `k4`: In this case the pruning value (between 0
    224   and 15) can be simply read off the table.
    225 * 2 bits per entry, or `k2`: Tables of this kind work as described by
    226   Rokicki in the
    227   [nxopt document](https://github.com/rokicki/cube20src/blob/master/nxopt.md).
    228   In this case the pruning table also has a *base value*, that determines
    229   the offset to be added to each entry (each entry can only be 0, 1, 2 or 3).
    230   If the base value is `b`, a pruning value of 1, 2 or 3 can be used directly
    231   as a lower bound of b+1, b+2 and b+3 respectively. However, a value of 0
    232   could mean that the actual lower bound is anything between 0 and b, so we
    233   cannot take b as a lower bound. Instead we have to use a pruning value from
    234   another table - see the section "Fallback tables" below.
    235 * 1 bit per entry, or `k1`: With one bit per entry, the only information we
    236   can get from the pruning table is wether or not the current position
    237   requires more or fewer moves than a fixed base value b. This can still be
    238   valuable if most positions are more or less equally split between two
    239   pruning values.
    240   (Work in progress - `k1` tables not available in the code yet)
    241 
    242 ### Fallback tables
    243 
    244 When a pruning table does not store the exact lower bound value, for
    245 example in the case of `k2` table as described above, we need to refine
    246 our estimate using another table, which we call *fallback table*.
    247 In the current implementation, we actually use two different tables:
    248 
    249 * **h0** table: for larger `k2` tables we get a fallback value from the full
    250   table for the **h0** coordinate.
    251 * edges-only table: to improve the pruning value for certain specific
    252   scrambles, namely whose with solved corners such as the superflip,
    253   we employ a second table that takes into account the edges of a full
    254   **h11** coordinate. This table is small (around 1MB).
    255 
    256 More tables could be used to refine the fallback estimate, but each
    257 additional table leads to longer lookup times, especially if it is
    258 too large to fit in cache.
    259 
    260 ### Estimation refinements
    261 
    262 After computing the pruning value, there are a number of different tricks
    263 that can be used to improve the estimation.
    264 
    265 #### Inverse estimate
    266 
    267 A cube position and its inverse will, in general, give a different
    268 pruning value. If the normal estimate is not enough to prune the branch,
    269 the inverse of the position can be computed and a new estimate can be
    270 obtained from there.
    271 
    272 #### Reducing the branching factor - tight bounds
    273 
    274 *This trick and the next are and adaptation of a similar technique
    275 introduced by Rokicki in nxopt.*
    276 
    277 We can take advantage of the fact that the **h0** (and **h11**) coordinate
    278 is invariant under the subgroup `<U2, D2, R2, L2, F2, B2>`. Suppose we
    279 are looking for a solution of length `m`, we are at depth `d` and the
    280 pruning value `p` *for the inverse position* is a strict bound, that is
    281 `d+p = m`.  In this situation, from the inverse point of view the solution
    282 cannot end with a 180° move: if this was the case, since these moves are
    283 contained in the target group for **h0** (or **h11**), it must be that
    284 we were in the target subgroup as well *before the last move*, i.e. we
    285 have found a solution for the **h0** coordinate of length `p-1`. But this
    286 is in contradiction with the fact that the inverse pruning value is `p`.
    287 
    288 From all of this, we conclude that trying any 180° move as next move is
    289 useless (because it would be the last move from the inverse position).
    290 We can therefore reduce the branching factor significantly and only try
    291 quarter-turn moves.
    292 
    293 #### Reducing the branching factor - switching to inverse
    294 
    295 We can expand on the previous trick by using a technique similar to NISS.
    296 
    297 Suppose that we have a strict bound for the *normal position*. As above,
    298 we can deduce that the last move cannot be a 180° move, but this does not
    299 tell us anything about the possibilities for the next move. However, if
    300 we *switch to the inverse scramble* we can take advantage of this fact
    301 as described above. For doing this, we need to replace the cube with its
    302 inverse, and keep track of the moves done from now on so that we can
    303 invert them at the end to construct the final solution.
    304 
    305 When this technique is used, it is also possible to avoid some table
    306 lookups: if the last move applied to the cube is a 180° move *on the
    307 inverse position*, then the coordinate on the normal position has not
    308 changed. Thus if we keep track of the last computed pruning value,
    309 we can reuse it and avoid an expensive table lookup.
    310 
    311 ### Other optimizations
    312 
    313 Other possible (low-level) optimizations include:
    314 
    315 * **Avoid inverse computation**: computing the inverse of a cube position
    316   is expensive. We can avoid doing that (for the inverse pruning value
    317   estimate) if we bring along both the normal and the inverse cube during
    318   the search, and we use *premoves* to apply moves to the inverse scramble.
    319 * **Multi-threading (multiple scrambles)**: It is easy to parallelize this
    320   algorithm when solving multiple cubes at once, by firing up multiple
    321   instances of the solver. It is important to make sure that the same
    322   (read-only) pruning table is used for all instances, to avoid expensive
    323   memory duplication.
    324 * **Multi-threading (single scramble)**: It is also possible to parallelize
    325   the search for a single scramble. For example, we can generate 18 different
    326   cubes, one for each possible starting move, and solve each of them in a
    327   separate thread. Some coordination between threads is necessary to stop
    328   the search when the desired number of solutions has been found.
    329 
    330 ## Pruning table computation
    331 
    332 ### 4 bits tables for h0 and h11
    333 
    334 Computing the pruning table for a "real" h48 coordinate, that **h0**
    335 or **h11**, using 4 bits per entry is quite simple. The method currently
    336 implemented works as follows.
    337 
    338 First, we set the value of the solved cube's coordinate to 0 and every
    339 other entry to 15, the largest 4-bit integer. Then we iteratively scan
    340 through the table and compute the coordinates at depth n+1 from those at
    341 depth n as follows: for each coordinate at depth n, we compute a valid
    342 representative for it, we apply each of the possible 18 moves to it, and
    343 we set their value to n+1. Once the table is filled or we have reached
    344 depth 15, we stop (there are no **h0** coordinates at depth 16 or more,
    345 but I currently don't know if this is the case for **h11**).
    346 
    347 Unfortunately what I described above is an oversimplification: one
    348 also must take into account the case where the corner coordinate is
    349 self-symmetric, and handle it accordingly by making sure that every
    350 symmetric variation of the same coordinate has its value set at the
    351 right iteration.
    352 
    353 When this algorithm reaches the last few iterations, the coordinates at
    354 depth n are many more than those whose depth is still unknown. Therefore
    355 it is convenient to look at the unset coordinate and check if they have
    356 any neighbor whose depth is i; if so, we can set such a coordinate to i+1.
    357 Thanks to [torchlight](https://github.com/torchlight) for suggesting
    358 this optimization.
    359 
    360 Finally, this algorithm can be easily parallelized by dividing the set
    361 of coordinates into separate sections, but one must take into account
    362 that a coordinate and its neighbors are usually not in the same section.
    363 
    364 This method is currently implemented only for **h0**, since the 4 bits
    365 table for **h11** would require around 115GB of RAM to compute, and I
    366 don't have this much memory.
    367 
    368 ### 2 bits tables with for h0 and h11
    369 
    370 (Work in progress - currently there is no specialized routine for
    371 computing 2 bits table for "real" coordinates; instead, an optimized
    372 version of the generic method explained below is used)
    373 
    374 ### A generic method for intermediate coordinates (from h1 to h10)
    375 
    376 (Work in progress - this method is quite slow and it may be replaced
    377 by a better algorithm in the future)
    378 
    379 Computing the pruning tables for intermediate coordinates is not as
    380 simple.  The reason is that these coordinates are not invariant under
    381 the 48 transformations of the cube, but we are treating them as such.
    382 Take for example the case of **h10**. Each **h10** coordinate is
    383 represented by either of two **h11** coordinates: the one where the 11th
    384 edge is correctly oriented and the one where this edge is flipped (of
    385 course, the orientation of the 12th edge can be deduced from the parity
    386 of the orientation of the other 11). We can take any cube whose **h11**
    387 coordinate is one of these two to represent our **h10** coordinate, but
    388 the set of neighbors will change depending on which representative we
    389 picked. In other words: if I take one of the two cubes and make a move,
    390 the position reached is not necessarily obtainable by applying a move
    391 to the other cube. This means that the same algorithm that we described
    392 for the "real coordinate" case cannot be applied here.
    393 
    394 It is still possible to do a brute-force depth-first seach to fill
    395 these tables.  To make it reasonably fast, we apply a couple of
    396 optimizations. First of all, we restrict ourselves to computing a 2 bits
    397 table, so we can stop the search early.
    398 
    399 The second optimization we employ consists in pre-computing all possible
    400 **h11** coordinates at a fixed depth, and storing their depth in a hash
    401 map indexed by their **h11** coordinate value. This way we do not have
    402 to brute-force our way through from depth 0, and it make this method
    403 considerably faster. Unfortunately, since the number of coordinates
    404 at a certain depth increases exponentially with the depth, we are for
    405 now limited at storing the coordinates at depth 8. (Work in progress -
    406 we may experiment with depth 9 in the future)
    407 
    408 This method works for the "real" coordinates **h0** and **h11** as well.
    409 Moreover, in this case one can optimize it further by avoiding to repeat
    410 the search from a coordinate that has already been visited. (Work
    411 in progress - this method will be replaced in the future by a more
    412 efficient one)
	h48 A prototype for an optimal Rubik's cube solver, work in progress.
	git clone https://git.tronto.net/h48
	Download \| Log \| Files \| Refs \| README \| LICENSE