commit f25a10e19eca294c4e6a99e4f80ce5cfd11a0e5f
parent 4a016679d1f5f33ab0679dfc315222797075dc65
Author: Sebastiano Tronto <sebastiano@tronto.net>
Date: Sun, 18 Aug 2024 10:41:52 +0200
Moved TODO.txt to personal folder
Diffstat:
D | TODO.txt | | | 193 | ------------------------------------------------------------------------------- |
1 file changed, 0 insertions(+), 193 deletions(-)
diff --git a/TODO.txt b/TODO.txt
@@ -1,193 +0,0 @@
-# Documentation and cleanup
-
-- Internal documentation
- - h48 solver and co
- - what is h and k
- - cube representation and base routines?
-- Split files
- - h48 into:
- - h48_base (coordinate computation)
- - h48_map (only the hashmap thing)
- - h48_gendata (table generation)
- - h48_solve (including stats solver)
-
-# H48 table generation
- - compute all tables for h<11
- x compute visited up to a fixed depth 8
- - compute additional step (if needed) to fill <=base
- - brute-force the last 2 steps (only 18 moves + 18*15 move pairs, + sim)
- - compare with known h0 results (from long-running test)
- - compute table for h=11
- - can it be unified to the other computation, or is it much better
- to do it ad hoc?
- - derive small tables from the large one to check correctness
- - this requires too much ram, but I can print the summary of the table
- - Add long-running test for h0k4 (maybe as a tool?)
- - tests for other sizes?
- - gendata tool: save tables?
- - optimize
- - use only transform_edges (need compose trans)
- - parallelize with pthread
-
-# Solver (Enrico)
-
-- Add a solver for h=0
-- check if this (or equivalent) works:
- ./run solve -solver H48 -options "2;20" -n 1 -M 10 -cube \
- "$(./run frommoves -moves "UFRUFU")"
-
-table base for k=2 (4 most common values start at)
- 0 8
- 1 8
- 2 8
- 3 8 or 9 (very close)
- 4 9
- 5 9
- 6 9
- 7 9 or 10 (very close)
- 8 10
- 9 10
- 10 10
- 11 11
-
-Solver
- - cleanup h48 solver
- - do not copy dfsarg, change and undo
- - implement and use premove (and test) instead of inverting
- - improve name of tables file in shell.c (include h value, maybe k, max)
- - benchmark for solve
- table generation, where to keep tables? in benchmark folder or in tables/?
- - more tricks for solver, optimize, try larger tables
- - remove solve_simple and maybe the whole solve_generic
- - shell: silently accept other formats too?
- - shell: allow generating multiple tables for different options
- - gendata: move info at start of tables?
-
-Cleanup cube_public and interface
- - remove options, use only solver name
- cleanup also benchmark
- - write a generic parse + dispatch to solver
- (maybe use helper function from solve_h48 for parsing hXkY)
-
-Goal: find out which k value is best
- - temporarily call current table and solver "k4" instead of h48
- - write table generation and solver for k2 and k1
- - benchmark for different sizes!
-
-Improvements
- - check hash of generated data
- - use interleaved tables (e.g. big table with k=2 or k=1 and interleaved
- small table with k=4 for better backup pruning)
-
-small things
- - maybe move part of the logic for coord_h48 (and its inverse) to
- utils.h (subsettoindex-like)
- - rename TYPE build switch to something more intuitive like ARCH
- remove one of TYPE and CUBE_TYPE (why do I have two?)
- - merge constants and utils?
-
-## H48 optimal solver (some has already been implemented)
-
-First compute co + csep. Use csep as a binary number (2^7 instead of 70,
-loose a factor of 1.8 but still fits in a few megabytes or less). Use
-co + csep as an index in a table whose entries have: 6 bits for ttrep,
-12 bits for rep, 4 bits for pruning. Optionally, 4 more bits could be
-used for the base of the pruning table, if we want to have a different
-base for each corner state; but probably not useful.
-
-If the first pruning is enough, or if the base value of the pruning table
-(see below) is too low, do not compute the full coordinate (which includes
-epsep + partial EO, 12 different sizes depending on how many edges).
-
-Otherwise, transform edges only using ttrep and compute full coordinate.
-Look up in table. 3 types of table:
-1. 4 bits per entry, full pruning table
-2. 3 bits with base value (let's try, why not)
-3. 2 bits with base value, nxopt style
-4. 1 bit per entry, telling only if more or less than mid value
-Types 2-4 require benchmarks, a lot of them.
-
-Inverse probing (no need to compute inverse, compute one at the beginning
-and keep adding premoves); better do first part of pruning for both
-normal and inverse and only then search in the full table.
-
-If inverse probing gives tight bound, reduce branching factor, optionally
-switch. Here NISS may be useful.
-
-## Other solvers
-
-* nxopt (various sizes, for comparison; also use base value probing
- and benchmarking)
-* Coordinate solver for replacing nissy backend (specify in the comments
- that coordinates return 0 if solved)
-* simple solver with small table for short solutions
-
-## ARM NEON intrinsics and other architectures
-
-* For ARM: use two uint8x16_t (or uint8x16x2_t) and vqtbl* instructions;
- see https://developer.arm.com/architectures/instruction-sets/intrinsics
-* Implement also SSE? Why not...
-
-## Optimizations
-
-* Moves: don't do full compose for U*, D*, *2 (I removed this because I
- was using shuffle intructions wrong, should re-do it)
-* transform edges only for h48 coord calculation
-* ptable: since it is fully symmetric, do only U or U2 at depth 1
-* use threads: how to detect at runtime? what is sane number to default to?
-* multisolve with adaptive threading
-* Trans: don't do full compose, for some trans composing perm is enough.
- Split out sumco() as a separate function and refactor, optimize.
-* Use multi-move (up to 4/5 moves at once)
-* see if vcube's method to flip all corners is better
-* find a better way for computing the inverse?
-* Transform with big table: make static cube actually static (how?)
-* Use selfsim: in generating some tables, it is in thery possible to only check
- the few transformations that give self-similarity instead of all 48.
- The performance drop is almost insignificant, but I would like to figure out
- the mistake I made previously.
-
-## Improvements and other things
-
-* Rename to libnissy (prefix public functions with nissy_?)
-* add centers (and slice moves and rotations)
- for avx2: centers in the same lane as corners, numbered from 9 to 14
-* for CO: move to bits 5 and 6, no need for padding bit
-* manipulate move sequences (invert, unniss, cleanup, mirror / transform...)
-* NISS: Add mask to moves (e.g. U | NISS where NISS = 32 or something);
- adapt readmoves and writemoves.
-* More I/O formats:
- reid format
- nissy
- ascii art (color = 1 letter? color print?)
- twizzle binary https://www.experiments.cubing.net/cubing.js/spec/binary/
-
-## "Front-end"
-
-* nissy shell: make more usable, meaningful error messages etc
- for example, it should check that the solver is valid
-* Write adapter code for other languages:
- python
- hare (see blog post 2023-12-01 for ffi)
- rust, go
- dart ffi, js
- java
-
-## More documentation?
-
-* Add documentation comments inside cube.c?
-* Copy this to cube.c
-
-Transformations can be either simple rotations or a rotation composed
-with a mirroring. A composed rotation + mirror is obtained by applying
-the corresponding rotation to the solved cube mirrored along the M plane.
-
-For example, to apply the transformation RBm (mirrored RB) to a cube C:
- 1. Apply a mirror along the M plane to the solved cube
- 2. Rotate the mirrored cube with z' y2
- 3. Apply the cube C to the transformed solved cube
- 4. Apply the transformations of step 1a and 1b in reverse
-
-## Future work?
-
-* A* on GPU? https://github.com/mwarzynski/uw_parallel_a_star