This is the second entry in my series about API design. While I wrote about granularity and request based APIs in my last post I want to write about API design and code architecture for modular data-oriented process this time. This is basically the write up of my findings while writing and finishing the core of my current GUI research quarks.
Actually sparking my interest for writing up my findings however was rasmusbarr@github (rasmusbarr@twitter) releasing his small data-oriented and SIMD-optimized 3D rigid body physics library. I noticed a lot of overlap in design between his physics and my GUI library and wanted to write up some general thoughts about what seems to be a generally "elegant" solutions for these types of problems.
First of I want to actually define what I mean specifically with modular data-oriented processes API design. I want to start with "data-oriented" which so far I have seen has two different meanings or interpretations. The first mainly spear headed by, at least as far as I know, Mike Acton is converting array of structs (AoS) into structs of arrays (SoA) to achieve higher performance using SIMD capabilities. The second meaning or interpretation is storing as much information as possible in data and only provide a small number of simple functions acting or working on this data. I am more interested in the second interpretation here. One example of the second type is the current iteration of Graphics APIs which require everything to be created and setup before actually "commiting" by executing a draw call. However I want to differentiate between layer based and module based APIs. So with "Modular" I mean strictly requires input and provides output data, unlike graphics APIs in its purest sense which is more like a data sink. Finally with "processes" I mean your basic computer science steps of data processing: Input => Processing => Output (so a rather functional view on processing).
So after hopefully providing a somewhat definitive overview about what this post is about I want to dive right in with the most important part of data-oriented design. The data. Since we have two different data variants one is input and the other one output I will start from beginning with input data.
In general I found that input data should be passed into processing functions in a way that perfectly fit the processing algorithm, programming language, and performance requirements. In general often times the way data is stored in application or in file does not fit the data model required for processing. They have different access patterns, size or complexity since sometimes only a fraction of all existing data is required for a single processing step.
One solutions for this problem are retain mode APIs which manage a copy of all required state for processing in their prefered format. However they have some big drawbacks. First most obvious they require you to first manually copy all state and then keep two version of data up to date at all time. Which is often absolutly tedious. Second these frameworks require a lot of control and become extremly bulky and complex over time to accommodate a growing number of requirements.
Another at least for me more interesting solution is adding an explicit conversion step before each processing pass. This conversion step is made up as an immediate mode API which sole purpose is to take in user data in whatever form it is provided and stored inside the application and converting it into a data format applicable for processing. Note that this conversion is done in code which will generate data. So you basically write code that will generate data.
Despite the fact there are costs assoziated with this explicit conversion step both in performance as well as memory. While performance can be an actual problem for very big datasets is memory usually an easier problem. Since all additional memory is only required until the end of the processing step and can then be freed. To simplify memory management even more you could use an linear/frame allocator which just allocates big blocks of memory and clears all memory in one go.
So far the concept was very clear for me. While working on quarks I noticed that input data can come in different forms. The first type is what I already descriped. Data that will be pushed each frame/update/application state change/... through an immediate mode API/code conversion step.
I also noticed two additional types. The first type is quite obvious in retrospect. Data that is known at compile/load time and by that I mean data that can already be stored in a format applicable at compile time/load time for processing and does not require any kind of modification/conversion. A quick example is almost all Widgets in GUI-libraries and static objects/mesh in physics libraries. For GUI-libraries it makes sense to have everything known to exist in compile- time immutable tables while for meshes you could load or stream in the correct processing format from file without having to requiring a conversion step. Sidenote honestly just writing this makes me feel stupid that is how obvious it is and it took me way to long to notice.
The second additional and third overall data type is data that fits both runtime and compile/load time. It has parts that are known at compile/load time but require additional data only known at runtime. An example for GUI libraries are plugins that require immutable compile time tables to be extended at runtime with addtional widgets. For physics libraries it could be that all dynamic objects are known at compile time but transformation data is only known at runtime. So you would have two sets of data working on the same object one generated for load/compile time the other generated and converted in runtime by immdiate mode API.
Finally for input I want to address another interesting point. Like previously stated immediate mode APIs generate data from code. Which is quite obvious if you look at old school immediate mode rendering APIs which generated vertex data by calling a functions for each vertex component. However what is intersting about this data from code concept is that it is possible and I would even proclaim desirable to have the possiblity to take this data generated from code and write it out in some data format like C-header or custom text or binary formats. For one it makes debugging a lot easier. Since you can generate data by code, dump it to file and have a bigger set of tools to examine it. If it is then possible to refeed this data back into the process step and come full circle. In addition can use your editor for compile/load time data generation. While it is still possible to have external tools to generate data as well. For GUIs it is probably desirable to have an external editor to create the biggest chunk of UI beforehand, same with game map editors or other preprocessing steps to generate static mesh data that can be directly fed into physics processing.
The next step is processing. While almost everything related to processing depends on the actual problem and algorithm and therefore is not really generalizable, there are still some intersting points to raise. The most interesting is that because of everything stated in the input step I descriped it is often possible to calculate beforehand how much memory is required for final persistent output data. So you can allocate the required amount of persistent memory and just pass it into the processing step. Sometimes it is also possible to calculate how much temporary memory is required but even if not you can reuse the temporary allocator and free all memory after processing. Finally something I want to address is that this processing step is purely deterministic and depending on the problem to be solved can even be pure in a functional sense. Which can even lead to additional debuggability by having values over time by keeping a fixed number of states in memory and allow backtracing for debug purposes or depending on the problem even more crazy things.
The last stage is output. Output data is the most complex and hardest part to design in these kind of APIs. At least so far I have not seen an easy one fits all solution to this step. One problem for example is mapping application data structures providing data on the application side of things to a specific output value generated by the processing step. In a perfect case scenario you would have a 1:1 relation between input and output value which would make mapping between them easy. However not all problems are made up that way. So in general often times some kind of identifier is required either auto generated or in form of a hash tables for mapping between application data and processed output data. Another not necessarily applicable solution under all circumstances are callbacks. Yet callbacks should never be used as only option for inter component communication. Another question is what format should the processing step push out. Once again this highly depends on the problem, but in general it should be as easy as possible to either directly use or convert output back into application state. So in a worse case scenario you have to provide an immediate mode API for conversion for both input as well as output.
Finally I want to address an idea that goes beyound simple input, processing and output APIs. In the sense that we have multiple of these APIs that can be chained together. We get a pipeline. Which is quite intersting for me since so far I always thought about piplines in context of applications and text as method of communication or at max in context of combining coroutines to push data through stages. However after seeing nudge basically having a pipeline of API functions to push physics data through definitely sparked my interest. Basically this is the holy grail of API design. Piplines in unix land were later somewhat misunderstood and applications became bloated because no one cared for orthogonal and diagonal applications (going back to my last post on API design). Still piplines so far are the closest we ever got to real modularity and seeing that it is possible to have it in APIs as well is definitely interesting.
I think there might be some overlap here between data-oriented design and data-parallelism. Data-parallelism is the lens used to split up problems elegantly for eg. SIMD and multi-core.
I think there's a bit of a nuance here, because you can treat writing GPU commands as a way to build a schedule of future work. This includes scheduling allocations, in the sense that we can prepare a schedule of allocations (and deallocations) in the form of writing GPU commands. For example, this allows you to reuse memory that is no longer necessary in the current frame. Actually, you mention a lot of similar ideas later in the post.
Depends, you might use the GPU as an accelerator to do computations and read them back. Alternatively, you might build a feedback loop that stays on the GPU side between frames.