Skip to content

Instantly share code, notes, and snippets.

@mandarinx
Last active September 28, 2023 03:51
Show Gist options
  • Star 45 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save mandarinx/9ea91ed42f5e82bb7b83 to your computer and use it in GitHub Desktop.
Save mandarinx/9ea91ed42f5e82bb7b83 to your computer and use it in GitHub Desktop.
Unity3D optimization tips

Unity3D optimization tips

Some code allocates memory when running in the editor and not on the device. This is due to Unity doing some extra error handling so possible error messages can be output in the console. Much of this code is stripped during build. It's therefore important to profile on device, and do it regularly.

Optimizations often makes code more rigid and harder to reason. Don't do it until you really need to.

When profiling, wrap methods or portions of a method in Profiler.BeginSample(string name) and Profiler.EndSample() to make them easier to find in the profiler.

public class SomeClass {
	
	public void Update() {
		// Misc. code here ...

		Profiler.BeginSample("Heavy calculations");
		// Heavy calculations
		Profiler.EndSample();

		// More misc.code here
	}
}

Enum as key in a dictionary

Enums can be used as keys in a dictionary, but they require some setup to avoid littering the heap.

Mono will do boxing on the enum when comparing dictionary keys. Looking up an item in a dictionary via ContainsKey, TryGetValue or using array notation on the dictionary will allocate objects on the heap.

The boxing happens because of Mono's default equal comparer, and can be fixed with a custom IEqualityComparer. The downside is that you need to implement an equal comparer for each of your enums. See the discussion on Stack Overflow and the blog post from Somasim for a more in-depth explanation.

Enums are backed by integers and can therefore be cast to and from ints. You can make a utility method that handles the casting for you, like this:

public Something GetSomething(SomeEnum e) {
	return someDictionary[(int)e];
}

With a method like this, you can safely redefine your dictionary to use an int as key, avoid the allocation problems and still have the readability of using an Enum.

References

Struct as key in a dictionary

Just like with enums, structs will get boxed and put on the heap when used as keys in a dictionary. The solution is to have the struct implement IEquatable, override GetHashCode(), Equals() and the == and != operators to use strongly typed equality comparisons.

using System;
using UnityEngine;

public struct MyStruct : IEquatable<Int3> {
	public readonly int x;
    public readonly int y;
    public readonly int z;

    public MyStruct(int x, int y, int z) {
        this.x = x;
        this.y = y;
        this.z = z;
    }

    public override bool Equals(System.Object obj) {
        if (!(obj is MyStruct)) {
            return false;
        }

        MyStruct i = (MyStruct)obj;
        return (x == i.x) && (y == i.y) && (z == i.z);
    }

    public bool Equals(MyStruct i) {
        return (x == i.x) && (y == i.y) && (z == i.z);
    }

    public override int GetHashCode() {
        return x ^ y ^ z;
    }

    public static bool operator == (MyStruct a, MyStruct b) {
        return a.x == b.x && a.y == b.y && a.z == b.z;
    }

    public static bool operator != (MyStruct a, MyStruct b) {
        return !(a == b);
    }
}

References

Lists

Iterating a list using foreach creates a new instance of an Enumerator. Mono discards the enumerator when done iterating, thus causing unnecessary memory allocations. Tests shows that some cases don't cause allocations, such as using foreach on a generic list and dictionary. See the link to Jackson Dunstan's blog.

When developing, default to using foreach because it is easier to read and work with. Replace with a for loop only when you see the need for it, which you'll discover during optimization. If you do replace foreach with for, cache the length of the array before starting the iteration.

Any object references you make during iteration should use objects defined outside the loop to prevent allocations, like this:

SomeClass someInstance = null;
for (int i=0; i<10; i0++) {
	someInstance = someList[i];
}

In any case, preallocate slots for arrays, lists and dictionarys when instantiating them. Pick the maximum number of items you expect to fill the list with.

When instantiating a list without preallocating slots, you allocate only space for the list itself. When you call list.Add(), the application will find new memory space to hold the entire list -including the newly added item, copy the old list to the new space, add the item, and point the list reference to the new memory address. You can easily avoid this by preallocating multiple slots.

private List<T> list;

void Start() {
	list = new List<T>(25);
}

Fun fact

Did you know that Dictionary.Keys and Dictionary.Values are not lists but classes? Simply accessing a dictionary's Keys property allocates some bytes on the heap.

References

Branching

A modern CPU tries to stay ahead of the script's execution point and precalculates as many lines ahead as possible. When reaching an if statement it uses branch prediction to either decide early on what branch to take, or calculate both branches to be safe. The more complex and harder to predict your branch is, the more likely the CPU will precalculate both branches.

If your code features lots of branches, and is hard to rewrite, consider using a behaviour tree instead. Make sure the behaviour tree implementation doesn't use any if statements, and that none of the behaviour tree's nodes use any.

Most branches can be avoided using simple arithmetics and lists. I managed to reduce the runtime of a function from 22ms to 2ms by avoiding a few branches. It was a busy function, getting called around 2.900 times per update.

Avoiding branches is also a good practice because it often results in code that's easier to read and maintain.

Example

Call one function when a is equal to b, and another when they're not.

public class Example01 {
	
	public void Run(int a, int b) {
		if (a == b) {
			Oneway();
		} else {
			TheOther();
		}
	}
	
	private void OneWay() {}
	
	private void TheOther() {}
}

We can avoid the branch using an array of functions, and some simple arithmetics to figure out which branch to chose.

public class Example01 {
	
	private Action[] actions;
	
	public Example01() {
		actions = new Action[2] { OneWay, TheOther };
	}
	
	public void Run(int a, int b) {
		int branch = Min(Math.Abs(b - a), 1);
		actions[branch]();
	}
	
	private void OneWay() {}
	
	private void TheOther() {}
	
	private int Min(int a, int b) {
		return (a + b - Mathf.Abs(a-b)) * 0.5f;
	}
	
	private int Max(int a, int b) {
		return (a + b + Mathf.Abs(a-b)) * 0.5f;
	}
}

To calculate the array index of the correct function, we first get the absolute value of the difference between a and b. This limits the range from 0 to the maximum value of Int32. Lastly we clamp that value between 0 and 1 using the Min method. I left the Max method in there just so you can see how it compares to Min.

When a equals b, the difference is 0, and we use the first function in the actions array. For all other values the second function is used.

Strings

Don't use strings for anything but displaying text in the UI or logging.

It's better to cache strings in a list and pass a pointer to the string to the UI element, than passing a raw string.

public class StringCaching {
	
	private UIElement uiElm;
	private string[] labels;
	
	public StringCaching() {
		labels = new string[2] {
			"Hello world",
			"Hello there"
		};
	}
	
	// This is bad
	public void RawString() {
		uiElm.label = "Hello world";
	}
	
	// This is good
	public void CachedString() {
		uiElm.label = labels[0];
	}
}

When doing string concatenation, use a StringBuilder. A StringBuilder can be reused, but be aware that emptying a StringBuilder causes memory allocations. In .net 2.5 you don't have access to StringBuilder's Clear method. Your only options are using Replace or setting the Length to 0. These methods causes memory allocations that seem to be proportional to the size of the string.

If you have a string with a constant length, like a timer that shows minutes and seconds (MM:SS), you should try using a character array instead.

public class Timer {

	private char[] digits;
	private UIElement uiElm;
	
	public Timer() {
		digits = new char[5];
		digits[2] = ':';
	}
	
	public void Update() {
		digits[0] = minutesTen;
		digits[1] = minutes;
		digits[3] = secondsTen;
		digits[4] = seconds;
		uiElm = new string(char);
	}
}

Depending on the length of your string, and how much you need to change, using a character array might not be the solution for you. Test using the profiler to see what causes the least amount of allocations in your case.

Don't use strings for comparisons. If you're searching for an object named something, see if you can use an Enum, or an integer ID instead.

Recycling

Use object pools so that you can reuse frequently used objects without the cost of a new allocation.

Make objects reusable by implementing a Reset method. A good practice is to use the Reset method in the constructor. That way you can be sure that as long as the object is in the right state when initialized, it will also be when resetting.

public class Poolable {
	
	private int value;
	
	public Poolable() {
		Reset();
	}
	
	public void Reset() {
		value = 0;
	}
}

Objects like game entities, sound effects, even messaging objects are good candidates for object pools.

Meshes

When changing vertex colors of a mesh, use MeshRenderer's additionalVertexStreams to add a per mesh instance override. This way you'll prevent creating new instances of each mesh.

Delegates

When declaring delegates, default to an empty function, or cache an empty function to avoid the extra null check. Simplifes code and is better for branch prediction.

public class DelegateCaching {
	
	private Action nullAction = () => {};
	private Action callback;
	
	public DelegateCaching() {
		callback = nullAction;
	}
	
	public void Update() {
		callback();
	}
}

In this example we create an empty function using a lambda expression and point callback to it. NullAction could also be private method as longs as it has the correct signature. In Update() we don't need to check if callback is null.

Be however careful with the lambdas. Mono will cache your lambdas, which makes them safe to use. But can easily create a closure by making a reference to an external variable or function, and that breaks caching. Mono will in those cases generate a new class on the fly, each time the lambda is called. The newly generated class pollutes the heap and stresses the garbage collector.

Don't do like this:

public class LambdaCaching {
	
	private List<MyObject> list;
	
	public LambdaCaching() {
		list = new List<MyObject>(10);
		list.Add(new MyObject());
		// etc.
	}
	
	public void Update() {
		int l = list.Count;
		for (int i=0; i<l; i++) {
			Callback(i, myObj => {
				myObj.DoSomething(i);
			});
		}
	}
	
	private void Callback(int i, Action<MyObject> callback) {
		callback(list[i]);
	}
}

The bad happening in this example is the referencing of the i int inside the lambda expression. Mono will create a new class with the i int as a private field, ten times each update.

Prefabs loaded via Resources

Unload prefabs from memory after loading them from Resources.

public class Loader {

	public void Load(string asset) {
		GameObject prefab = Resources.Load<GameObject>(asset);
		GameObject instance = GameObject.Instantiate(prefab);
		Resources.UnloadAsset(prefab);
	}
}

When loading prefabs from Resources you always instantiate them, otherwise you'll start to make changes to the prefab itself. Since instantiating duplicates the object there's no reason to keep the prefab around.

Caching array length

Array.Length doesn't need caching, but List<T>.Count does. Check this out: http://jacksondunstan.com/articles/3577

Data structures

All though not directly a tips for optimization, the data structure will largely affect how much of your code you actually can optimize. It's a task well worth spending time on to get right.

TL;DR: Keep it simple!

Data should be easy to work with. The API should help you get the right data with the least amount of input, at the shortest time possible. Keep the API simple to make it easy to learn, but add as many function overrides as you need to make it flexible.

Decide on a set of basic data blocks and design the API around them. In a game of pool a ball could be a basic data block. The table holds a list of balls, and the pockets are functions that accepts a ball as parameter.

You should be able to pass the data blocks you get from the API back to it without the need to transform them. In an adventure game, when a character picks up an item from the ground, it shouldn't need to create a new instance of an object just to be able to add it to the inventory.

When deciding on a data structure, prioritize the simplest first. If you don't need a resizable list, choose an Array instead. If you need a more complex structure, add helper methods to make it easier for yourself and your peers to work with the data.

Keep scalability in mind when designing the data structure, but don't overdo it. The more scalable the data is, the harder it is to work with. Scalable data means the possibility to add more data without breaking the API or the structure.

Whenever you want to add more properties to your data, think it through. Can you get the information you're looking for from other properties? Can you merge the new property with another to keep the API simple? In a platformer game the characters probably have a health property. Instead of adding a boolean called isDead, can you instead check for health == 0? Always go for the least amount of data that you need, and pick the data that gives the most meaning.

Good practice

Gain control over the garbage collector by doing large cleanups when the game is not doing anything important, like in a menu scene, and avoid creating allocations when the game is running.

Keep classes small and simple. It'll make it easier to reason the code and fix bugs. When class gets larger than 300 lines, split it. In cases where you need to make a large class, split it using a partial class.

Splitting code into smaller functions makes it easier to reuse code and identify bottlenecks during profiling. Reusable functions also reduces the need to write more code, which results in less time spent hunting bugs.

Write code in decoupled, self contained classes with methods that accepts as few parameters as possible and have no hard dependencies to other classes to make it easier to run isolated tests. Your development cycle will be slow if you always need to compile the entire project and play up to a specific point in the game to test a certain feature. Make it easy to either mock data or load data from disk.

More stuff on Unity optimizations

@OscarAbraham
Copy link

OscarAbraham commented Nov 28, 2020

Hi, thanks for lots of helpful ideas. I believe the information in the section about branching is not completely right. In your example, where you use delegates to avoid the if condition, there's still branching. All this kinds of indirect calls (delegates, virtual methods, function pointers) involve branching, because the CPU doesn't know what function will be executed until it arrives to the execution statement. In other words, the CPU doesn't know for sure what function to load ahead of time.

I believe this kinds of tricks could avoid branch mispredictions when the function address is known and it doesn't change often. Yet, I'm not really sure it works in your example. I guess, depending on where "a" and "b" come from, it could make some CPUs predict the branch better than when using condition. The modern heuristics of branch prediction are really complex, so I don't discard the idea completely. I've seen weirder things work in very specific cases.

That said, I usually get a little worst performance with this kinds of tricks, because of the function call, the cache misses, and the overhead of c# delegates or the vtable look up of virtual methods. I saw a benchmark of something like that and it got roughly the same amount of branch mispredictions, but I can't find the Stack Overflow answer now.

If someone reads this, I'd love to see an example of a Behavior Tree being faster by doing these things. I never heard of something like that. After reading this, I spent like five hours looking the web without finding anything. I'd really appreciate it; I'm really curious about this concept.

Thanks for taking the time to read all this. :) I hope you have a very nice day.

@mandarinx
Copy link
Author

Hi!

It was a bit embarrassing reading the section about branching again, after all these years. :-) I agree, it's far from perfect. You're right, it's still branching even if I use a delegate. I should have provided more context. In my experience, what matters the most is profiling your own code. In my case, I shaved off 20 ms of execution time by using delegates.

I think what it boils down to is what kind of work you give the CPU to base the branch prediction on. In this case, it's simple arithmetics. It should be easy for the CPU, as long as it has the correct data to do the calculation, and get the correct branch immediately.

I remember the method I was optimizing, it was a recursive one. It called itself around 2900 times per update. I can only assume the penalty of loading the correct delegate was being paid on the first iteration, and then it was kept in cache, and was much quicker to get on the next iterations.

I've spent most of the last 10 years or so, learning the ins the out of dotnet, C# and Mono to be able to optimize my code. What I find, is that tricks that worked in one project don't matter that much in another. It always boils down to profiling to figure out what works for your current code. Sometimes I try old tricks, just to see what happens, even if I'm not sure they'll work. Now that Unity is using IL2CPP, it doesn't seem like all of the optimization tricks work. Some were maybe specific to Mono.

Don't spend too much time on the behaviour tree research. I don't think my example code is worthy of the name behaviour tree. :-)

@OscarAbraham
Copy link

Thanks, mandarinix! I hadn't though about recursion. What you say makes perfect sense to me, even though 20ms still surprises me. Yeah, I guess you're right that optimization strategy varies a lot with each case. I'm currently trying to optimize a BT and HTN implementation and I'm trying everything I can find to see what sticks.

I think that, while it's true that some optimization are probably only applicable to Mono, another reason for the difference is that IL2CPP still lacking in some respects. I think the main example is virtual methods, specially interface methods, which I've found to be considerably slower than many other implementations. Other less notable examples I know are statics and generics, which add some code at the start of every method that uses them and, in some edge cases, make IL2CPP a lot slower than Mono. So, maybe some optimizations that work in Mono will work again in IL2CPP if those things get fixed.

Thanks a lot for taking the time to help a stranger; I really appreciate it. Happy Holidays.

@mandarinx
Copy link
Author

mandarinx commented Nov 30, 2020 via email

@midiphony
Copy link

Hello! :)
A later article from Jackson Dunstan seems to indicate that using foreach on Lists doesn't allocate anything since Unity 5.6 : https://www.jacksondunstan.com/articles/3805

Could you add a reference to this article and maybe update the List section?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment