Thank you to CodinGame for running the contest! I really enjoyed solving this problem.
I finished 14th (of 2,865)
https://www.codingame.com/contests/spring-challenge-2025/leaderboard/global
Since most people probably landed on the same general approach, I’ll focus mainly on a timeline of changes I made along with their impacts on my performance.
My solution throughout this entire competition has been a BFS with a hash table that maps a state (a 9x3 bitmap) to a count of times that state was seen. Language choice, hash table implementation, state canonicalization, and modular arithmetic turned out to be the most important concepts of this challenge.
Starting off, I built a solution in C# just so I could understand the rules and get to 100%. This got me as low as 6,000ms.
Very quickly I realized that a more performant language was necessary, so I decided to try Rust. This was my first real attempt at using Rust, and it worked out pretty well. I learned a lot. This migration of my logic got me down to 1,700ms, which surprised me. The GC (or my usage of it) really hurts. I'll most likely be using Rust for CG competitions from now on.
By swapping out the default Hasher implementation from the secure one to a simplified version of FxHasher (which is just a wrapping add, multiply, and shift) got me down to 1,500ms.
My biggest improvement was handling board symmetry. By calculating all 8 mirrored and rotated boards and shuffling counts for each of the symmetries as the search progressed deeper, I was able to drop an entire second down to 500ms.
This 8x8 mapping table and its inverse somehow took me a couple sheets of graphing paper to wrap my head around, but I got it working and it paid off.
After this I spent a lot of time messing with the Rust language itself, learning how to profile code, and tweaking and precomputing stuff wherever possible. Converting vectors to arrays dropped me to 300ms, and probably a dozen other changes combined got me down to 270ms. The main things that helped were simplifying loops, if statements, and bitwise operations (like flipping and rotating a board in one operation instead of a flip followed by a rotate).
At this point I recalled basic modular arithmetic rules and realized how stupid it was to store all my values in u128 types (because they’d overflow even as u64s). Just by doing math correctly I was able to drop my state size down from 132 bytes to 36 which got my overall score down to 100ms.
From here I never really made much progress despite spending a ton of time building custom trees and hash maps and hybrids. The closest I got to matching the performance of the built-in HashMap was still about 5% slower, so in the end I just went with a HashMap that mapped states to indexes, and a vector that contained the next depth’s states to expand. This gave me the best combination of key lookup speed and full set iteration speed in testing.
My state generation started with a precomputed array of neighbors for each cell that I’d iterate through, but later this was changed to a hardcoded list of all permutations which helped a bit. I tried precomputing large tables of masks and resulting child states to hopefully generate states in only a few lookups, but it was somehow slower.
What I find interesting (especially after reading some of the other postmortems) is that I was able to finish as well as I did without any unsafe code. I certainly tried quite a few things that required it, but they never performed better in practice and so didn't make it into the final version.
Since I wasn’t convinced that the final tests wouldn’t just be slight modifications of the current validators (rotated or increased depth), in the last few hours I went with the crowd and decided to hardcode all the known validators. Hopefully the final validators were randomly and fairly chosen to prevent such hardcoding, but if not, I wanted to partake.
In the end, my flamegraph indicated that I was spending about 18% of my time canonicalizing, 16% generating states, 28% doing hashmap stuff, and 13% adding to and removing from vectors. With my current Rust experience, I think this is as performant as I can get without digging deeper into the finer points of the language.
Because the search space for this game is so low, I was able to calculate a massive lookup table that maps any unique state count (what you see on the tests and validators) to a list of board/depth pairs. If this becomes a standard optimization puzzle after the competition ends then I’ll be able to hardcode the solutions to any fixed validators very quickly. People were discussing the "trial and error" approach to discovering the validators, but when you know the unique state count you can actually just compute them.
To combat this, you could have several validators that change on every submission, drastically increase the number of validators, or change the measurement from milliseconds to maybe counting the number of tests you can complete in the given time. If the validators change randomly then it’ll mess with the rankings and promote frequent resubmits. If there are enough validators that it becomes infeasible to hardcode them all in 100k chars (~200kb) then it’ll tax your servers (especially for lower bots that require the 10 seconds). If the scoring algorithm changes you’ll still have problems with either random submits or hardcoding. It’ll be hard for the results of an ongoing puzzle to be meaningful.
Someone in Discord also mentioned making this into a two player game, which is actually a great idea since it would let you randomly generate the test cases while still being fair. This isn't to suggest that players take turns since that would change the rules. Basically the bots would operate exactly as they do today without any knowledge of their competitor. Both players would get the same test and faster bots would rise to the top. Hardcoding wouldn't benefit you much. Lower leagues could get simpler boards.
Thanks!
