This is a high-level description of my Halite III bot. At the time of writing, a few days before the end of the competition, it has a rating of slightly over 74, which puts it at rank ~115.
The bot is based on Value Iteration and Uniform Cost Search for mining halite and returning it to the nearest dropoff. Dropoffs are not pre-planned -- a ship converts when enough halite is in the vicinity. Spawning decisions are made by mirroring the opponent (2P) or a simple linear classifier (4P).
Finding Halite to mine
The bot uses the Value Iteration algorithm to make its mining decisions. The basic algorithm, applied to Halite, looks like this:
For all squares s: values[s.x][s.y] = halite[s.x][s.y] / 4 Repeat 16 times: For all squares s: For all neighbors n of s: values[s.x][s.y] = max(values[s.x][s.y], 0.95 * values[n.x][n.y])
This creates a gradient the ships can follow to the highest density of halite.
The discount factor (
0.95 in the example) naturally leads to the ships
trading off immediate mining on the current square versus going for larger
deposits further away.
Improvements to the basic algorithm
The naive application of the algorithm will move all friendly ships in the same direction. This is obviously undesirable, because they get stuck behind each other and can't get to the halite. It's possible to recompute the VI for every turtle and avoid the problem, but a more elegant approach is to simply use a different estimate for the initial value that depends on the proximity of ships:
Computing Value using a ConvNet
Instead of using the raw amount of harvestable halite as the reward value in Value Iteration, the bot computes its own estimate of how valuable moving a ship to a certain square is using a neural network.
It's a tiny, single-layer Convolutional Neural Network with a three-channel input, symmetric 3x3 kernel and a single output channel for the value.
Thus, the initial value of a square is determined by
- The amount of halite in a 3x3 neighborhood
- The number of friendly ships in the 3x3 neighborhood and how much cargo they hold
- The number of opponent ships in the 3x3 neighborhood and how much cargo they hold
After the initial value of a square is computed, Value Iteration takes over as before. Even the 3x3 neighborhood is sufficient for the network to detect nearby friendly ships and lower the value of a square sufficiently to remedy the 'thundering herd' problem.
The neural network is able to coordinate movement of a large number of ships, but it does not display any truly intelligent-seeming qualities. Larger networks (with a 9x9 kernel that can account for inspiration) gave rise to more interesting behavior, such as groups of ships 'bullying' opponent ships, walling off especially valuable parts of the map or intentionally colliding with opposing ships. Despite that, the actual performance in terms of winrate was inferior to the simple network described above, probably because there was not enough time to fully optimize the much more numerous weights of the larger network.
All ships that can return more than 750 halite (after accounting for halite burned on the way home) will immediately set sail for the nearest dropoff.
Returning cargo is straightforward: use Uniform Cost Search to compute an optimal path from one or more starting locations (i.e. the shipyard and dropoffs) to all squares on the map. The ships that want to return can then follow the computed path backwards to the nearest dropoff.
The path cost calculation includes three components:
- Time: How many turns does this path take?
- Halite: How much halite is burned on this path?
- Proximity: Does the path bring a ship into contact with an opponent that could ram it?
These three different penalties are mixed using optimizable parameters.
The dropoff calculations are less straightforward, but manageable: if there is
X amount of halite in the vicinity, and we have more than
as many ships as dropoffs, and there is no friendly dropoff within
convert the current ship into a dropoff. The halite in friendly ships
surrounding the current one is counted 4 times, in order to preferentially make
dropoffs that are already near turtles that have cargo to return.
Calculating how much halite is in the vicinity quickly is done using a Summed-area table. The idea behind a SAT is quite ingenious, but accounting for Halite's toroidal playing field requires taking ten different cases that can occur into account, so the implementation is a bit tedious.
In two-player games, the bot produces ships until it has more
ships + dropoffs than its opponent. Accounting for the number of dropoffs is something
a lot of people following the strategy of mirroring their opponent seemingly
neglected. They pull ahead by one ship and stop production, but fail to account
for the number of dropoffs and the faster accumulation of halite they allow,
which easily compensates for a deficit of one ship.
In FFA games the spawn decision is made using a linear classifier on the following features (scaled between 0 and 1):
- How many turns have elapsed?
- How many turns are remaining?
- How many friendly ships are there?
- How many opposing ships are there?
- How much halite is left (average per square)?
- Bias (always 1.0)
Each feature is multiplied by a weight, and if the sum of all weighted features is positive, a new ship is spawned.
In order to avoid collisions, all ships 'bid' on the target squares they want to occupy on the next turn. The height of the bid is determined from the utility that square/action has -- for example, a ship that is full and returning gets priority over one that is just moving out to harvest (unless it is blocking the dropoff). There are several such exceptions, but nothing too interesting.
The performance of the bot stands and falls with the setting of two dozen parameters and neural network weights (per map size).
OpenAI-style Evolution Strategies are a simple but effective method of blackbox parameter optimization that can deal with a large number of parameters (given enough compute).
Using them to tune the bot (over several 24h runs on a quad-core machine) has led to a (cumulative over multiple versions) increase of about 10µ rating points over the baseline with hand-selected parameters.
- The bot does mostly ignore the opponent. There is no intentional ramming of opponents and only very simple collision avoidance.
- There is no inspiration management