Skip to content

Instantly share code, notes, and snippets.

@kazimuth
Last active January 23, 2018 21:35
Show Gist options
  • Save kazimuth/115dabb6ece109545966b0125112bfe6 to your computer and use it in GitHub Desktop.
Save kazimuth/115dabb6ece109545966b0125112bfe6 to your computer and use it in GitHub Desktop.
Battlecode OOM Fix

Why is my Battlecode Java player saying "Killed"?

Assuming you're running in Docker, this means you're running out of memory.

An initial fix you should apply is to put -Xmx40m in your player/run.sh; see the examplefuncsplayer-java/run.sh from the current scaffold for how this should work.

Depending on how your code works, this may fix the problem. However, in some cases you may need to tune the number to be lower (for example, -Xmx10m).

Note that you should NOT set -Xmx256m or -Xmx512m; if you're curious, see the next section of this document for why setting the number so low is necessary.

In the game running interface, you can also set Player memory limit (in mb) to 512 instead of 256. In the next release we will set this to be the default, and raise the memory limit in the scrimmage servers. This should give you a little more breathing room. We'll also be tuning the engine to use less memory, coming sometime in the future.

If you're still getting the issue, try adding a call to System.gc() at the end of each of your player's turns, or more frequently.

If you're still getting the issue, you will need to reduce the amount of allocation in your code. If you're allocating large objects frequently, you should allocate them less often.

For instance, you may have the code:

// !!! DONT DO THIS !!!
for (int i = 0; i < gc.myUnits().size(); i++) {
    Unit unit = gc.myUnits().get(i);
}
// !!! DONT DO THIS !!!

In this code, every call to gc.myUnits() creates a new copy of the list of units. That will use a large amount of memory, and once you go over the limit Docker will kill your player. You should change your code to look like:

VecUnit units = gc.myUnits();
for (int i = 0; i < units.size(); i++) {
    Unit unit = units.get(i);
}

This will use significantly less memory, and as a bonus be way faster!

That's an annoying fix. Why do I have to do that?

The answer to this question is slightly technical, so bear with me. You also don't have to understand this to apply the fix; this is just for exposition's sake.

You've probably heard of Java's Garbage Collector. The Garbage Collector is a mechanism that every so often pauses your code and eats all of the objects you aren't using. For instance, look at the code:

for (int i = 0; i < 100; i++) {
    new Object();
}

You've made a bunch of objects, but you didn't do anything to clean them up. What happened to them? Do they just exist forever?

No, they get garbage collected. Java looks through your running code to finds all the objects you're using, and then throws them away, so that you can reuse the space they were taking up in your computer's RAM.

The garbage collector runs when it detects that you're running low on memory. Normally, this works fine. However, in Battlecode, it doesn't work quite right, for two reasons.

Firstly, Docker is slightly broken, and when Java is running inside Docker and trying to detect how much memory is available, it often gets the wrong result. This means that the garbage collector won't run as often - if Java thinks it has 1gb of space when it actually only has 256 MB of space, it will blithely allocate more space than is actually available, and not collect it. Docker will then kill the process because it's zealous about this sort of thing, and you get the Killed message.

A second, similar problem, is that the engine is not actually written in Java. It's written in a language called Rust, which we bind from Java - where "bind" means "call functions from another programming language". (Here's some of the engine's code, if you're curious.) When you call gc.writeTeamArray() in Java, some magic happens, and then the Rust function GameController::write_team_array is called. (The same thing happens when you call gc.write_team_array() in Python.)

That's very nice for us devs, because we only have to write the engine in one language instead of 3 or 4. However, there's a problem. When you create an object from the game engine in Java - for instance, by calling gc.myUnits() - the VecUnit object that's returned has two parts: a Java part, and a Rust part. The Java part is very small, around 32 bytes. It pretty much only has one component: a reference to the Rust part of the object.

The Rust part of the object, though, can be quite large. If there are 100 of your units on the map, the Rust part of the VecUnit is probably around 15 kb in size. Way bigger than 32 bytes! Other rust objects are quite small though. For instance, a MapLocation is 12 bytes, and a Direction is only one byte. It's really only the large allocations that are the problem here - containers like VecUnit and AsteroidPattern.

The problem, then, is that the Java garbage collector doesn't know about the Rust part of the object; it only knows about the Java part. If you've allocated 1000 VecUnits, from Java's point of view you've allocated 32 kb; but from Rust's point of view, you've allocated 1.5 mb. Java then won't know that it needs to run the garbage collector; and again, you'll go over the memory limit, and Docker will kill your player. Killed.

The solution to this problem is to tell java there's less memory available than there actually is. That way you can compensate for the memory that java doesn't know about, and things will hopefully work out okay.

That's why you have to tune the -Xmx setting by hand, unfortunately. Depending on the ratio of large:small objects in your code, it'll need to take different values. You can also just take the shotgun approach and sprinkle System.gc() calls into your code, which will accomplish the same thing.

In some future release we may find a way to fix this problem; however, Java doesn't expose any APIs that would let you fix this sort of thing. If we do manage to get a better fix, it's going to require some black magic, and the sacrifice of at least two goats; so please bear with us while we poke around. We're also going to be tuning the engine to use less memory, and raise the memory limit on the scrim servers (up to 512 mb), to help alleviate this further.

We apologize for this problem; it wasn't something we saw coming when writing the engine. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment