Skip to content

Instantly share code, notes, and snippets.

@regiskuckaertz
Last active July 11, 2024 13:22
Show Gist options
  • Save regiskuckaertz/d6483a07010ca78fcc66f1c6fa077fa3 to your computer and use it in GitHub Desktop.
Save regiskuckaertz/d6483a07010ca78fcc66f1c6fa077fa3 to your computer and use it in GitHub Desktop.
Sudoku: Twin exclusions

In this round of optimisations we will apply another common technique for making progress in a Sudoku. It is also a good opportunity to explore a couple of additional data structures.

Here are some stats from the first round of optimisations:

➜ head -n100 sudoku17.txt | time ./Sudoku
...
./Sudoku  71.92s user 0.55s system 99% cpu 1:12.78 total

After this round, we get to:

➜ head -n100 sudoku17.txt | time ./Sudoku
...
./Sudoku  0.67s user 0.02s system 94% cpu 0.735 total

Nice speedup 😎 Let's get cracking.

Hidden twin exclusion rule

The optimisation here is to spot "hidden twins" (or triplets) and remove all the other choices in those cells. To put it simply, if two cells are the only ones containing two digits, then we can filter out all other digits from those cells. The same reasoning applies with three cells sharing three digits, etc.

To achieve this, we need to get into a position where we can map cells with the digits they share. Say, for this row:

[  3  6789][  3  6789][ 2       ][  3  6  9][      78 ][   4     ][1        ][    5    ][      78 ]

We would like to arrive at (excluding singleton cells):

✓ [1,2,4]   -> [3,6,9]
  [1,2,5,9] -> [7,8]

Cells 1, 2 and 4 are the only ones where the digits 3, 6 and 9 can go; we can safely discard 7 and 8 from cells 1 and 2.

You will write the transformation exclude, similar to reduce, that performs the twin exclusion. Let's update prune as such:

prune = pruneWith boxs . pruneWith rows . pruneWith cols
  where pruneWith f = f . map (exclude . reduce) . f

Exercise 1. Cells by digit

Here's the plan of attack: first we map each digit to the list of cells where they appear, then we flip the relation so that each list of cells maps to a list of digits. Since we send reduce a row of choices (css), we will keep track of cell indices inside that row: cssi = zip [1..9] css :: [(Int, [Digit])].

Write the function:

cellsByDigit :: [(Int, [Digit])] -> [(Digit, [Int])]

You may use digits to help you build the result. It is up to you to decide what to do with singulars, or with digits that do not exist in the input.

Exercise 2. Digits by cells

Reversing the mapping mean we need to deal with potential conflicts. For this, I find it is easier to use a bona fide map data structure in order to accumulate the result:

import qualified Data.Map as M

NB: many packages are designed to be imported qualified, because they contain functions like map and filter that already exist in the Prelude.

Familiarise yourself with the API available in Data.Map, as you will need it to write:

digitsByCells :: [(Digit, [Int])] -> M.Map [Int] [Digit]

Exercise 3. Spread the map

The purpose of this exercise is to take each entry of the map produced by digitsByCells and perform the following transformation:

in :: ([Int], [Digit]) ↝ out :: [(Int, [Digit])]

It is trivial, but we need to think of what we have to do with it next. Remember from the description that we need only those entries where the length of the key equals the length of the value, then we must install the value as the new list of choices for each matching cell. This suggests some form of random access to the values in out.

Lists are not great for random access (O(n)), neither are maps (O(n log n)). For this, it's better to use either Data.Array or Data.Vector, both of which offer random access in constant time. I have used the former, but you are free to choose whichever you want.

import qualified Data.Array as A

Write the function:

spreadDigits :: M.Map [Int] [Digit] -> A.Array Int (Maybe [Digit])

The function accumArray is perfect for this use case.

Exercise 4. Wrapping up

We can now write reduce:

reduce xss = map trim xssi
  where xssi = zip [1..9] digits
        trim (i, xs) = fromMaybe xs (spds ! i)
        spds = _

where fromMaybe a optA is the Haskell version of optA getOrElse a.

With that, run your program and check the results.

Note that there is some subtelty I did not mention in the exercises above and it is likely you will fall into the trap. If yes, you will have to investigate the behaviour of your program in GHCi. Have fun!

@regiskuckaertz
Copy link
Author

regiskuckaertz commented Sep 23, 2018

Further optimisations

If survival of humanity depended on solving sudokus as fast as possible, we'd have to perform further optimisations. I wrote a version replacing [Digit] with bit arrays—since our set of solutions for a cell is a subset of [1..9], we can encode this using a 16-bit word, Word16 and perform bitwise operations. The result was 2x faster (~350s for solving 49K sudokus).

The way to investigate is to first gather some useful statistics about the behaviour of your program. To do that, you need to enable runtime system flags at compile-time with -rtsopts:

ghc --make -rtsopts Sudoku.hs

When executing your program, pass options between +RTS and -RTS (you can omit the latter if your program does not take any argument). For example, here are some memory statistics using the -s option:

➜ head -n100 sudoku17.txt | ./Sudoku3 +RTS -sstderr > /dev/null
   1,888,973,200 bytes allocated in the heap
      44,348,000 bytes copied during GC
         188,288 bytes maximum residency (12 sample(s))
          32,664 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1805 colls,     0 par    0.064s   0.068s     0.0000s    0.0015s
  Gen  1        12 colls,     0 par    0.001s   0.001s     0.0000s    0.0001s

  INIT    time    0.000s  (  0.003s elapsed)
  MUT     time    0.942s  (  0.950s elapsed)
  GC      time    0.065s  (  0.068s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    1.007s  (  1.021s elapsed)

  %GC     time       6.4%  (6.7% elapsed)

  Alloc rate    2,006,011,961 bytes per MUT second

  Productivity  93.6% of total user, 93.0% of total elapsed

Our program only uses 3MB of memory and yields to the GC only ~7% of the time; that's already quite good. Memory usage is probably not a good area to look at first. However, if it were the next step would be to take a heap profile with variants of -h and convert it with hp2ps:

sudoku3

It is most likely a thorough time/space profile will help identify cost centres.

ghc --make -rtsopts -prof -fprof-auto -fforce-recomp Sudoku3.hs
head -n100 sudoku17.txt | ./Sudoku3 +RTS -p > /dev/null

Look into Sudoku3.prof, the interesting bit is this table:

COST CENTRE MODULE SRC %time %alloc
cellsByDigit.cellsFor.cs Main Sudoku3.hs:77:11-53 32.3 24.0
diff Main Sudoku3.hs:(133,1)-(135,41) 11.3 9.6
exclude.cbds Main Sudoku3.hs:67:9-109 9.6 8.4
digitsByCells.insert Main Sudoku3.hs:71:9-75 5.8 2.6
nodups Main Sudoku3.hs:(123,1)-(124,43) 4.7 2.7
exclude.trim Main Sudoku3.hs:64:9-42 4.3 3.1
exclude.dbc Main Sudoku3.hs:66:9-105 3.1 2.9
cellsByDigit.cellsFor Main Sudoku3.hs:(76,3)-(77,53) 2.6 2.2
sings Main Sudoku3.hs:138:1-23 2.4 3.0

Aaaaand there it's clear I would drill down into my version of cellsByDigit, particularly line 77, which eats up 32% of the total runtime!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment