h48

A prototype for an optimal Rubik's cube solver, work in progress.
git clone https://git.tronto.net/h48
Download | Log | Files | Refs | README | LICENSE

commit f25a10e19eca294c4e6a99e4f80ce5cfd11a0e5f
parent 4a016679d1f5f33ab0679dfc315222797075dc65
Author: Sebastiano Tronto <sebastiano@tronto.net>
Date:   Sun, 18 Aug 2024 10:41:52 +0200

Moved TODO.txt to personal folder

Diffstat:
DTODO.txt | 193-------------------------------------------------------------------------------
1 file changed, 0 insertions(+), 193 deletions(-)

diff --git a/TODO.txt b/TODO.txt @@ -1,193 +0,0 @@ -# Documentation and cleanup - -- Internal documentation - - h48 solver and co - - what is h and k - - cube representation and base routines? -- Split files - - h48 into: - - h48_base (coordinate computation) - - h48_map (only the hashmap thing) - - h48_gendata (table generation) - - h48_solve (including stats solver) - -# H48 table generation - - compute all tables for h<11 - x compute visited up to a fixed depth 8 - - compute additional step (if needed) to fill <=base - - brute-force the last 2 steps (only 18 moves + 18*15 move pairs, + sim) - - compare with known h0 results (from long-running test) - - compute table for h=11 - - can it be unified to the other computation, or is it much better - to do it ad hoc? - - derive small tables from the large one to check correctness - - this requires too much ram, but I can print the summary of the table - - Add long-running test for h0k4 (maybe as a tool?) - - tests for other sizes? - - gendata tool: save tables? - - optimize - - use only transform_edges (need compose trans) - - parallelize with pthread - -# Solver (Enrico) - -- Add a solver for h=0 -- check if this (or equivalent) works: - ./run solve -solver H48 -options "2;20" -n 1 -M 10 -cube \ - "$(./run frommoves -moves "UFRUFU")" - -table base for k=2 (4 most common values start at) - 0 8 - 1 8 - 2 8 - 3 8 or 9 (very close) - 4 9 - 5 9 - 6 9 - 7 9 or 10 (very close) - 8 10 - 9 10 - 10 10 - 11 11 - -Solver - - cleanup h48 solver - - do not copy dfsarg, change and undo - - implement and use premove (and test) instead of inverting - - improve name of tables file in shell.c (include h value, maybe k, max) - - benchmark for solve - table generation, where to keep tables? in benchmark folder or in tables/? - - more tricks for solver, optimize, try larger tables - - remove solve_simple and maybe the whole solve_generic - - shell: silently accept other formats too? - - shell: allow generating multiple tables for different options - - gendata: move info at start of tables? - -Cleanup cube_public and interface - - remove options, use only solver name - cleanup also benchmark - - write a generic parse + dispatch to solver - (maybe use helper function from solve_h48 for parsing hXkY) - -Goal: find out which k value is best - - temporarily call current table and solver "k4" instead of h48 - - write table generation and solver for k2 and k1 - - benchmark for different sizes! - -Improvements - - check hash of generated data - - use interleaved tables (e.g. big table with k=2 or k=1 and interleaved - small table with k=4 for better backup pruning) - -small things - - maybe move part of the logic for coord_h48 (and its inverse) to - utils.h (subsettoindex-like) - - rename TYPE build switch to something more intuitive like ARCH - remove one of TYPE and CUBE_TYPE (why do I have two?) - - merge constants and utils? - -## H48 optimal solver (some has already been implemented) - -First compute co + csep. Use csep as a binary number (2^7 instead of 70, -loose a factor of 1.8 but still fits in a few megabytes or less). Use -co + csep as an index in a table whose entries have: 6 bits for ttrep, -12 bits for rep, 4 bits for pruning. Optionally, 4 more bits could be -used for the base of the pruning table, if we want to have a different -base for each corner state; but probably not useful. - -If the first pruning is enough, or if the base value of the pruning table -(see below) is too low, do not compute the full coordinate (which includes -epsep + partial EO, 12 different sizes depending on how many edges). - -Otherwise, transform edges only using ttrep and compute full coordinate. -Look up in table. 3 types of table: -1. 4 bits per entry, full pruning table -2. 3 bits with base value (let's try, why not) -3. 2 bits with base value, nxopt style -4. 1 bit per entry, telling only if more or less than mid value -Types 2-4 require benchmarks, a lot of them. - -Inverse probing (no need to compute inverse, compute one at the beginning -and keep adding premoves); better do first part of pruning for both -normal and inverse and only then search in the full table. - -If inverse probing gives tight bound, reduce branching factor, optionally -switch. Here NISS may be useful. - -## Other solvers - -* nxopt (various sizes, for comparison; also use base value probing - and benchmarking) -* Coordinate solver for replacing nissy backend (specify in the comments - that coordinates return 0 if solved) -* simple solver with small table for short solutions - -## ARM NEON intrinsics and other architectures - -* For ARM: use two uint8x16_t (or uint8x16x2_t) and vqtbl* instructions; - see https://developer.arm.com/architectures/instruction-sets/intrinsics -* Implement also SSE? Why not... - -## Optimizations - -* Moves: don't do full compose for U*, D*, *2 (I removed this because I - was using shuffle intructions wrong, should re-do it) -* transform edges only for h48 coord calculation -* ptable: since it is fully symmetric, do only U or U2 at depth 1 -* use threads: how to detect at runtime? what is sane number to default to? -* multisolve with adaptive threading -* Trans: don't do full compose, for some trans composing perm is enough. - Split out sumco() as a separate function and refactor, optimize. -* Use multi-move (up to 4/5 moves at once) -* see if vcube's method to flip all corners is better -* find a better way for computing the inverse? -* Transform with big table: make static cube actually static (how?) -* Use selfsim: in generating some tables, it is in thery possible to only check - the few transformations that give self-similarity instead of all 48. - The performance drop is almost insignificant, but I would like to figure out - the mistake I made previously. - -## Improvements and other things - -* Rename to libnissy (prefix public functions with nissy_?) -* add centers (and slice moves and rotations) - for avx2: centers in the same lane as corners, numbered from 9 to 14 -* for CO: move to bits 5 and 6, no need for padding bit -* manipulate move sequences (invert, unniss, cleanup, mirror / transform...) -* NISS: Add mask to moves (e.g. U | NISS where NISS = 32 or something); - adapt readmoves and writemoves. -* More I/O formats: - reid format - nissy - ascii art (color = 1 letter? color print?) - twizzle binary https://www.experiments.cubing.net/cubing.js/spec/binary/ - -## "Front-end" - -* nissy shell: make more usable, meaningful error messages etc - for example, it should check that the solver is valid -* Write adapter code for other languages: - python - hare (see blog post 2023-12-01 for ffi) - rust, go - dart ffi, js - java - -## More documentation? - -* Add documentation comments inside cube.c? -* Copy this to cube.c - -Transformations can be either simple rotations or a rotation composed -with a mirroring. A composed rotation + mirror is obtained by applying -the corresponding rotation to the solved cube mirrored along the M plane. - -For example, to apply the transformation RBm (mirrored RB) to a cube C: - 1. Apply a mirror along the M plane to the solved cube - 2. Rotate the mirrored cube with z' y2 - 3. Apply the cube C to the transformed solved cube - 4. Apply the transformations of step 1a and 1b in reverse - -## Future work? - -* A* on GPU? https://github.com/mwarzynski/uw_parallel_a_star