Skip to content

Instantly share code, notes, and snippets.

@mandarinx
Created October 26, 2016 06:47
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mandarinx/a9bf9c3c987574fa453a1d90fa7f7276 to your computer and use it in GitHub Desktop.
Save mandarinx/a9bf9c3c987574fa453a1d90fa7f7276 to your computer and use it in GitHub Desktop.
Data Oriented Design

Data Oriented Design in game development

DOD is about

  • Understanding your data.

Mike Acton from Insomniac Games said that in order to understand the problem, you need to understand the data. If you don’t understand any of them you will never find the optimal solution.

  • Focusing on the data flow and how data is read and written.

Because the hard thing about programming is never about the syntax, it’s to know at any point during the execution what the value of your data is and who touched it.

OOP

The main concern in OOP is encapsulation. We construct objects in such a way that they reflect our understanding of the design of the game. We often use real world analogies when naming the objects, properties and methods, and we try to structure them in a logical hierarchy. In a racing game, you can be pretty sure to find a class called Car somewhere in the source code. Depending on the implementation the class might include properties like maxSpeed, turnRadius and methods like Drive() and Brake(). The design of the application is very often tightly coupled with the implementation.

Being able to structure the code in the same way you view the design of the application makes it easy to quickly get something up and running. In a way it makes sense to create a class for everything you can see. With a class for the car, the track, trees and finish line you can already see the contours of a racing game. One can say this is a strength of OOP, but it is also a weakness.

When the code is tightly coupled to the design, any change of the design run the risk of a need to refactor the code. Refactoring gets increasingly harder and more time consuming as the code grows. The bigger the refactoring, the greater the chance of introducing breaking changes. Often because of time and budget constraints, we avoid the refactoring and work around it instead.

Object hierarchies often makes it very hard to decide where to extend the code. Data is usually scattered across many objects so the data you need might not be easily accessible. This can lead you to write glue code to gather and prepare data, before passing it on. Or you can take the easy way and start passing references to entire manager objects, or even resolve to using singletons to make everything accessible to anyone. These kind of solutions muddles the function definitions and makes it hard to follow the data flow.

DOD

DOD proposes a solution to these issues by shifting the focus from objects to data.

DOD is not a new idea to the gaming industry. Programmers wrote data oriented code long before OOP was introduced. I assume this approach to coding was taken because it was practical. Computers didn’t have much resources in the beginning. The CPUs weren’t as sophisticated as they are today, and their capacity was only a fraction of what’s mainstream today. So you had to write code that was optimal for the computer. Even though it’s an old way of coding doesn’t mean it’s outdated, quite the contrary.

Abner Coimbre is a developer working on the launch systems at the Kennedy Space Center. He and people like Casey Muratori, Jonathan Blow, Mike Acton, Tony Albrecht, Niklas Frykholm and a handful of others are the most vocal proponents of DOD in the games industry.

Jonathan Blow has been experimenting with creating a new programming language, specifically designed for games. It’s called JAI, and has explicit support for data-oriented design, while eschewing the traditional OOP paradigm.

Abner Coimbre posted a video on YouTube where he talked about what programming is about. He said that programming is never about the code, it’s always about the data. When you look at a programming language from a data oriented perspective, it makes sense.

A programming language has many different data types. They are there to specify your data and set the rules for how to work with them. You will also find lots of different operators, all there to make it possible for you to do operations on the data. There are also many different statements, which all require data in order to work.

By looking at programming from this perspective, it’s easy to see programmers adopt a data oriented approach.

Decoupling

This way of thinking completely separates functionality and data, which is in stark contrast to OOP.

Going back to the racing game example we see that the data we put in the Car class is tightly coupled to the context of a car. Even though there are many things that we can apply acceleration to, the acceleration property on the car belongs to the car. Such coupling of data and context contributes to making it harder to write reusable code.

In data oriented design we build general-purpose functions, free from context, that we compose together to give meaning to the data. In DOD these functions are called data transforms. A data transform is a function with limited responsibility, that is only allowed to work on the data passed in via its parameters. This way of building functions makes it easy to couple and decouple them from data, and therefore also adapting to design changes.

When I worked with Hyper Games on the Statnett Balance game, we created an architecture which has a focus on data. All data that needs to be serialized and kept for state management is contained in a single object. All though we didn’t write pure DOD-style code, this approach made it easy to create tools for use both in editor and at runtime.

Homogenous data

DOD always assumes you are working on more than one element. Writing code so that it can handle multiple elements makes it scalable, and syntactically if differs very little from working on single elements.

In OOP we use Array of Structures (AoS). When you create an object pool of e.g. Particle objects with properties for position, scale and color, the data will be laid out in memory like this. ´´´[ABC][ABC][ABC]´´´. Whereas in DOD we flip it around and use Structure of Arrays (SoA) instead. We break apart the Particle object and organize the data in arrays, one for position, one for scale and the last one for color. Now, the memory layout is like this: ´´´[AAA][BBB][CCC]´´´. With this approach we turned the particle data into homogenous strips of data.

The concept of a particle object exists only implicitly. We have to look up the data from e.g. position 0 in all of the arrays to get the complete particle. But due to some data being used more often than others, we don’t always need to look up a complete particle.

Working on homogenous data is better for the CPU’s internal caching. Modern CPUs usually have a few levels of caches, which are referred to as L1 and L2. Depending on your target platform, you may find a third or fourth level.

When you loop through an array the CPU starts by fetching the data from main RAM. It tries to stay ahead by fetching as much from the array as it can fit in it’s L2 cache. In the case of the object oriented particle system, it will fetch entire particle objects at a time. If you’re unlucky the particle object is so large that it occupies the entire L2 cache. On the next iteration of the loop, the CPU cannot find the next particle in the local cache and has to fetch it from main RAM. This continues throughout the array.

In the data-oriented particle system we loop through each array separately. This is better for the cache because a vector3 or a color object is much smaller than a particle class, and the CPU will therefore be able to fit more data into its L2 cache. DOD helps reducing the number of so-called cache misses and improves performance.

This is a problem that the proponents of DOD has been talking about for years. The development of processor speeds have been increasing more rapid than the development of memory speed, creating an ever growing gap between the two. For each new generation of CPU the memory latency grows larger, making it more important to keep as much of the most important data in the local caches as possible.

It’s hard to find accurate data on the actual latency, because it depends on your target platform. On the PS2 with its 300 Mhz processor, the latency was about 40 CPU cycles. On the PS3 with a 3,2 Ghz CPU, the latency was 600 CPU cycles. This is 20 times slower than on the PS2. Based on some data I found online, fetching data from RAM when using a desktop class CPU is around 200 times slower than fetching from its local cache.

In Statnett Balance we had a case of a function eating up lots of CPU time. The time spent in the function would grow as the number of entities increased. I set up a test level and filled it with more entities than we would need. The function was called recursively and would call itself about 4500 times per update, taking up 22 ms. That’s more than a full frame if you target 60 fps. I was able to optimize the function by laying out some of the data in arrays, using arithmetics to control the data flow and thus avoid a few branches. The end result was that the function would still be called about 4500 times, but now it only spent 2,2 ms per frame.

Optimization

DOD is a perfect fit for when you need to optimize your code. With a data-oriented approach you always make sure to lay out data in a way that is optimized for performance.

Mathematics, and especially arithmetics and algorithms becomes very useful tool. There’s a lot of help to find on the internet for using algorithms for solving specific problems. Computers are very good at mathematics, so by using mathematics you make computer friendly code.

There are tons of optimization tricks out on the internet, and many have been written specifically for Unity. But there’s one I haven’t seen mentioned much in the context of Unity optimizations, and that is how to handle if-statement. It’s hard to avoid using it but not impossible. It’s certainly easier when writing data-oriented code. When using Unity though, you will never get rid if it because there are so many parts of the Unity API that returns booleans, and they force you to branch.

Branching is bad because it wastes CPU time. Modern CPUs have something called branch prediction, which is a mechanism for trying to predict which branch of the if-statement your code will take. The CPU will make a best guess and execute the instructions of the branch it thinks your code will chose. Sometimes it’s correct and the code can continue, while other times it’s wrong and it has to discard all of the instructions and pick the correct branch.

The rules for branch prediction seems to be either complicated or not documented, because I haven’t found a clear explanation of what causes branch mis-predictions. The best is to try to avoid it, or at least minimize the use of if-statements. Rewriting your code to avoid using branches also makes it a lot easier to read.

There are many tricks to avoid branching, and it could be a talk in its own. It’s a very nice exercise to try to get rid of some of the branches in your own code. I encourage you to try it! You’ll be surprised to see how easy it is sometimes.

Lastly, DOD is according to the experts a very good fit for multi threading games. I’m no expert on threading, so I haven’t really much to add. If you’re working on multi threading your code and are looking for answers, you might find some in what’s written about DOD.

Conclusion

Data-oriented design is by no means opposed to object-oriented programming, just some of its ideas. As a result, you can use ideas from data-oriented design and still get most of the abstractions and mental models you're used to from OOP.

You can use DOD during the design of your data, and OOP during the design of your code. This should give you the best of both worlds. In general, DOD should help your OOP goals.

There are going to be times when the two conflict. You may have cases where the best way to layout data does not meet with the best way to design your objects. In most of these cases it should be a simple matter of deciding which is most important, and which can be sacrificed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment