DataWraith/WraithBot.md

## WraithBot.md

      
    Raw
  

              WraithBot.md
            
          
    WraithBot

This is a high-level description of my Halite III bot. At the time of writing,
a few days before the end of the competition, it has a rating of slightly over
74, which puts it at rank ~115.
Bot description

The bot is based on Value Iteration and Uniform Cost Search for mining halite
and returning it to the nearest dropoff. Dropoffs are not pre-planned -- a ship
converts when enough halite is in the vicinity. Spawning decisions are made by
mirroring the opponent (2P) or a simple linear classifier (4P).
Finding Halite to mine

The bot uses the Value Iteration
algorithm to make its mining decisions. The basic algorithm, applied to Halite,
looks like this:
For all squares s:
  values[s.x][s.y] = halite[s.x][s.y] / 4

Repeat 16 times:
  For all squares s:
    For all neighbors n of s:
      values[s.x][s.y] = max(values[s.x][s.y], 0.95 * values[n.x][n.y])

This creates a gradient the ships can follow to the highest density of halite.
The discount factor (0.95 in the example) naturally leads to the ships
trading off immediate mining on the current square versus going for larger
deposits further away.
Improvements to the basic algorithm

The naive application of the algorithm will move all friendly ships in the same
direction. This is obviously undesirable, because they get stuck behind each
other and can't get to the halite. It's possible to recompute the VI for every
turtle and avoid the problem, but a more elegant approach is to simply use a
different estimate for the initial value that depends on the proximity of
ships:
Computing Value using a ConvNet

Instead of using the raw amount of harvestable halite as the reward value in
Value Iteration, the bot computes its own estimate of how valuable moving a
ship to a certain square is using a neural network.
It's a tiny, single-layer Convolutional Neural Network with a three-channel
input, symmetric 3x3 kernel and a single output channel for the value.
Thus, the initial value of a square is determined by

The amount of halite in a 3x3 neighborhood
The number of friendly ships in the 3x3 neighborhood and how much cargo they
hold
The number of opponent ships in the 3x3 neighborhood and how much cargo they
hold

After the initial value of a square is computed, Value Iteration takes over as
before. Even the 3x3 neighborhood is sufficient for the network to detect
nearby friendly ships and lower the value of a square sufficiently to remedy
the 'thundering herd' problem.
Behavior

The neural network is able to coordinate movement of a large number of ships,
but it does not display any truly intelligent-seeming qualities. Larger
networks (with a 9x9 kernel that can account for inspiration) gave rise to more
interesting behavior, such as groups of ships 'bullying' opponent ships, walling
off especially valuable parts of the map or intentionally colliding with
opposing ships. Despite that, the actual performance in terms of winrate was inferior to the simple network described above,
probably because there was not enough time to fully optimize the much more
numerous weights of the larger network.
Returning Cargo

All ships that can return more than 750 halite (after accounting for halite
burned on the way home) will immediately set sail for the nearest dropoff.
Returning cargo is straightforward: use Uniform Cost Search to compute an
optimal path from one or more starting locations (i.e. the shipyard and
dropoffs) to all squares on the map. The ships that want to return can then
follow the computed path backwards to the nearest dropoff.
The path cost calculation includes three components:

Time: How many turns does this path take?
Halite: How much halite is burned on this path?
Proximity: Does the path bring a ship into contact with an opponent that
could ram it?

These three different penalties are mixed using optimizable parameters.
Dropoff Calculations

The dropoff calculations are less straightforward, but manageable: if there is
more than X amount of halite in the vicinity, and we have more than k times
as many ships as dropoffs, and there is no friendly dropoff within N squares,
convert the current ship into a dropoff. The halite in friendly ships
surrounding the current one is counted 4 times, in order to preferentially make
dropoffs that are already near turtles that have cargo to return.
Calculating how much halite is in the vicinity quickly is done using a
Summed-area table. The idea
behind a SAT is quite ingenious, but accounting for Halite's toroidal playing
field requires taking ten different cases that can occur into account, so the
implementation is a bit tedious.
Spawn decisions

In two-player games, the bot produces ships until it has more ships + dropoffs than its opponent. Accounting for the number of dropoffs is something
a lot of people following the strategy of mirroring their opponent seemingly
neglected. They pull ahead by one ship and stop production, but fail to account
for the number of dropoffs and the faster accumulation of halite they allow,
which easily compensates for a deficit of one ship.
In FFA games the spawn decision is made using a linear classifier on the
following features (scaled between 0 and 1):

How many turns have elapsed?
How many turns are remaining?
How many friendly ships are there?
How many opposing ships are there?
How much halite is left (average per square)?
Bias (always 1.0)

Each feature is multiplied by a weight, and if the sum of all weighted features
is positive, a new ship is spawned.
Collision avoidance

In order to avoid collisions, all ships 'bid' on the target squares they want
to occupy on the next turn. The height of the bid is determined from the
utility that square/action has -- for example, a ship that is full and
returning gets priority over one that is just moving out to harvest (unless it
is blocking the dropoff). There are several such exceptions, but nothing too
interesting.
Parameter optimization

The performance of the bot stands and falls with the setting of two dozen
parameters and neural network weights (per map size).
OpenAI-style Evolution
Strategies are a simple but
effective method of blackbox parameter optimization that can deal with a large
number of parameters (given enough compute).
Using them to tune the bot (over several 24h runs on a quad-core machine) has led to a (cumulative over multiple
versions) increase of about 10µ rating points over the baseline with
hand-selected parameters.
What's missing


The bot does mostly ignore the opponent. There is no intentional ramming of opponents
and only very simple collision avoidance.
There is no inspiration management