In this round of optimisations we will apply another common technique for making progress in a Sudoku. It is also a good opportunity to explore a couple of additional data structures.
Here are some stats from the first round of optimisations:
➜ head -n100 sudoku17.txt | time ./Sudoku
...
./Sudoku 71.92s user 0.55s system 99% cpu 1:12.78 total
After this round, we get to:
➜ head -n100 sudoku17.txt | time ./Sudoku
...
./Sudoku 0.67s user 0.02s system 94% cpu 0.735 total
Nice speedup 😎 Let's get cracking.
Hidden twin exclusion rule
The optimisation here is to spot "hidden twins" (or triplets) and remove all the other choices in those cells. To put it simply, if two cells are the only ones containing two digits, then we can filter out all other digits from those cells. The same reasoning applies with three cells sharing three digits, etc.
To achieve this, we need to get into a position where we can map cells with the digits they share. Say, for this row:
[ 3 6789][ 3 6789][ 2 ][ 3 6 9][ 78 ][ 4 ][1 ][ 5 ][ 78 ]
We would like to arrive at (excluding singleton cells):
✓ [1,2,4] -> [3,6,9]
[1,2,5,9] -> [7,8]
Cells 1, 2 and 4 are the only ones where the digits 3, 6 and 9 can go; we can safely discard 7 and 8 from cells 1 and 2.
You will write the transformation exclude
, similar to reduce
, that performs the twin exclusion. Let's update prune
as such:
prune = pruneWith boxs . pruneWith rows . pruneWith cols
where pruneWith f = f . map (exclude . reduce) . f
Here's the plan of attack: first we map each digit to the list of cells where they appear, then we flip the relation so that each list of cells maps to a list of digits. Since we send reduce
a row of choices (css
), we will keep track of cell indices inside that row: cssi = zip [1..9] css :: [(Int, [Digit])]
.
Write the function:
cellsByDigit :: [(Int, [Digit])] -> [(Digit, [Int])]
You may use digits
to help you build the result. It is up to you to decide what to do with singulars, or with digits that do not exist in the input.
Reversing the mapping mean we need to deal with potential conflicts. For this, I find it is easier to use a bona fide map data structure in order to accumulate the result:
import qualified Data.Map as M
NB: many packages are designed to be imported qualified, because they contain functions like map
and filter
that already exist in the Prelude
.
Familiarise yourself with the API available in Data.Map
, as you will need it to write:
digitsByCells :: [(Digit, [Int])] -> M.Map [Int] [Digit]
The purpose of this exercise is to take each entry of the map produced by digitsByCells
and perform the following transformation:
in :: ([Int], [Digit]) ↝ out :: [(Int, [Digit])]
It is trivial, but we need to think of what we have to do with it next. Remember from the description that we need only those entries where the length of the key equals the length of the value, then we must install the value as the new list of choices for each matching cell. This suggests some form of random access to the values in out
.
Lists are not great for random access (O(n)), neither are maps (O(n log n)). For this, it's better to use either Data.Array
or Data.Vector
, both of which offer random access in constant time. I have used the former, but you are free to choose whichever you want.
import qualified Data.Array as A
Write the function:
spreadDigits :: M.Map [Int] [Digit] -> A.Array Int (Maybe [Digit])
The function accumArray
is perfect for this use case.
We can now write reduce
:
reduce xss = map trim xssi
where xssi = zip [1..9] digits
trim (i, xs) = fromMaybe xs (spds ! i)
spds = _
where fromMaybe a optA
is the Haskell version of optA getOrElse a
.
With that, run your program and check the results.
Note that there is some subtelty I did not mention in the exercises above and it is likely you will fall into the trap. If yes, you will have to investigate the behaviour of your program in GHCi. Have fun!
Further optimisations
If survival of humanity depended on solving sudokus as fast as possible, we'd have to perform further optimisations. I wrote a version replacing
[Digit]
with bit arrays—since our set of solutions for a cell is a subset of[1..9]
, we can encode this using a 16-bit word,Word16
and perform bitwise operations. The result was 2x faster (~350s for solving 49K sudokus).The way to investigate is to first gather some useful statistics about the behaviour of your program. To do that, you need to enable runtime system flags at compile-time with
-rtsopts
:When executing your program, pass options between
+RTS
and-RTS
(you can omit the latter if your program does not take any argument). For example, here are some memory statistics using the-s
option:Our program only uses 3MB of memory and yields to the GC only ~7% of the time; that's already quite good. Memory usage is probably not a good area to look at first. However, if it were the next step would be to take a heap profile with variants of
-h
and convert it withhp2ps
:It is most likely a thorough time/space profile will help identify cost centres.
Look into
Sudoku3.prof
, the interesting bit is this table:Aaaaand there it's clear I would drill down into my version of
cellsByDigit
, particularly line 77, which eats up 32% of the total runtime!