Last Updated: 2024-2-22
As the original author and primary developer of the Latios Framework for Unity’s ECS, I regularly run into bugs, inadequate functionalities, and pitfalls within the ECS ecosystem. This has resulted in lots of “hacks” in the framework to patch up problems. This is a living document describing the issues and hacks. My hope is that the Unity ECS team will find this to be a valuable reference to help improve the quality of their packages.
This document is organized into four categories: Ship Stoppers, Feature Blockers and Hinderances, Hacks and Ugliness, and Annoyances, in greatest to least severity respectively.
This document does not include all possible features the Latios Framework may eventually implement if no official solution is provided. If you would like to learn about such features, it is best to ask me.
These are high-severity items that is preventing the Latios Framework from functioning correctly, with no plausible workarounds.
Entities 1.2 breaks determinism in a really bad way. Entity IDs are no longer
deterministic, meaning the only way to order a list of entities in a
deterministic way is to know both which chunks they belong to AND know the
indices of those chunks relative to some EntityQuery
.
Besides creating a bunch of confusion around whether chunk order determinism
even matters (because now it is really hard to preserve) and what the point of
the sortKey
in ECB is, this new change introduces a major problem in the
Latios Framework’s development, which is debugging.
Kinemation relies heavily on chunk components and caching of relationships. When a bug happens, it is crucial to be able to replay the simulation up to the bug to identify the source of the problem. Many of the algorithms Kinemation uses don’t have access to the chunk index and index in chunk for a list of entities collected in parallel. There is no way to order them deterministically in 1.2. That means that chunk order is not preserved, and whether or not two entities lie in the same chunk may change run to run. Thus if the bug was dependent on two entities being in the same or different chunks, the bug is only reproducible by chance. That’s really, really bad.
Full determinism per architecture was one of Unity’s major competitive advantages over other solutions. And now it is being thrown away. Sure, there’s the argument that streaming breaks it, but streaming could also pre-allocate the entities a subscene needs and assign them all a shared component to the loading subscene until the subscene is loaded and the chunks are swapped. If that swap step could also be manually triggered, then subscene streaming could even support lock-step.
Unity is going to have to offer significantly more value to compensate for this regression, otherwise I will probably stay at 1.1.
These are items which are preventing or creating unnecessary friction for specific new features or optimizations of the Latios Framework from being developed. They may also be creating undesirable effects on usage of the framework and reducing quality on the overall solution.
Aspect lookups are not only difficult to discover, but using them in
IJobEntity
requires a ton of boilerplate compared to ComponentLookup
.
There’s no SystemAPI
methods for getting auto-cached handles or lookups.
For performance reasons, the Latios Framework is starting to have really
complicated sets of components. A common example is that the Latios Framework
triple-buffers animation data so that things like motion vectors and inertial
blending can be easily evaluated. Yet rather than copy current to previous and
previous to two-ago, these buffers rely on control components to rotate the
roles. This behavior should be somewhat abstracted from the user, and IAspect
solves this case beautifully. Unfortunately, users struggle to acquire such
aspects from random entities in jobs.
The fact that we can’t use SystemAPI
in static methods makes it extremely
difficult to build extensions and common patterns. For example, I have a static
method Physics.BuildCollisionLayer()
that needs to schedule 5 jobs in
sequence. While there are several variants, one variant requires the first job
to perform chunk iteration. Securing such type handles is extremely problematic.
The user has to manually cache and update a struct containing those handles,
because this method can’t rely on SystemAPI
. That’s a lot of unnecessary
boilerplate burdened directly on the user.
Source generators are an incredibly powerful tool. The sad part is that up until
recently, the Latios Framework never used it. Why? Because there’s no
documentation on how to use it to solve ECS-specific problems. Sure, in the
manual you can find a page how to set up source generators using older Roslyn,
but a more modern tutorial would have solved a lot of problems the Latios
Framework faced a lot sooner. Unity now supports Roslyn 4.0 and incremental
source generators, and the Latios Framework is using exactly that to add new
IComponentData
to partial structs implementing specific interfaces. The
workflow is surprisingly good once you understand it. A little documentation on
how to do some common things to mitigate the use of runtime reflection and
generics would go a long way.
However, not every problem is solved. The IAspect
and SystemAPI
issues could
almost be solved by users if there was a way to add additional
OnCreateForCompiler
methods to systems. This could probably be done by
decorating custom methods that should run at that time point with an attribute
and having ILPP pick up on them.
Subscene import workflows have significant usability issues. Because they occur in a separate Unity process, they do not use Burst, cannot easily be debugged, have limited reporting capabilities of memory leaks and the like, and many engine features are not well tested when accessed in this mode (it took 3 years for the audio crash bug to be fixed).
The Latios Framework pushes the boundaries of what can be baked, with new and exciting high-level features. But that only works when baking itself works, which has been a constant pain point.
In MonoBehaviours
, you can do GetComponent<ISomeInterface>()
. In bakers,
this isn’t possible. Why?
Most of the time, I want a baker to check if some interface exists on the same Game Object, and if so, early out so that another Baker that processes the interface can work unhindered.
I started using FixedString
and BlobArray<byte>
in blobs because I couldn’t
log BlobStrings
in Burst-compiled code. There’s a lot of missing APIs and
features for BlobStrings
. Make them better so that I can be more efficient
with my data.
I have a task where I have M arrays of bytes and a separate N array of bytes.
For each array in M, I need to find an array in N that starts with all the bytes
in the array from M. Currently, I’m using UnsafeUtility.MemCmp
in O(n^2)
fashion. But I believe that sorting M and N by raw byte values would lead to a
faster algorithm. But can MemCmp
be used for this kind of sorting? Is there a
better approach?
A huge optimization I made with Kinemation’s renderer is writing to Graphics
Buffers and dispatching compute shaders inside the culling loop instead of
before it. This way I take culling results into account and do significantly
less work. This applies to skinning, material properties, blend shapes, other
mesh deformations, and whatever else. Only problem is that now I have to
complete culling jobs inside the culling callbacks so that I can end
GraphicsBuffer
writes and dispatch the compute shaders. It would be awesome if
BatchRendererGroup
could get an additional callback when the jobs need to be
completed so that I could do these compute shader dispatches as late as
possible. SRP shenanigans are a big chunk of my frame time and the worker
threads are starved.
It would be awesome if I could see an outline of all functions a Burst job compiled and jump between them. Right now it is still difficult to understand what is happening in critical sections of code in massive jobs.
Why does this not exist?
This triggers an error if the entity array creates a batch. It is really annoying because then I have to handle chunk components separately, which is an extra structural change.
I have to move an entity, or more often an array of entities twice if I have both a set of components to add and a set of components to remove.
If a zero-sized component is added or removed on all entities in the chunk, the chunk’s archetype is converted in-place, which is a great optimization. However, when there are only a couple of entities in the chunk because the source archetype represents a temporary state, then this conversion in-place will leave lots of chunks with only a small number of entities each, causing fragmentation. It would be awesome if as an additional check, if there is another chunk that can accommodate the entire existing chunk, that the entities move to the new chunk rather than perform the in-place conversion.
Most of the time, I find myself using sub-optimal structural change sequences just to avoid this edge case.
I have a job where there is a loop, and inside that loop is a callstack where each stack frame has logic that needs to perform allocations. The allocations add up fast, but usually have very short lifecycles and are mutually exclusive to each other. I would love special allocators that can be rewound directly in the job to recycle the memory efficiently.
WorldUpdateAllocator
doesn’t get rewound in baking worlds. Therefore, we have
to use TempJob
allocations everywhere when baking.
Sometimes, I really want to specify that a container in a job is meant to be
allocated in the job via Allocator.Temp
, without having to use the nuclear
attribute [NativeDisableContainerSafetyRestriction]
. Other times, I might have
a job that takes a variable number of DynamicComponentTypeHandles
, and I never
know what to populate the unused slots with. Again, disabling container safety
is really bad because then if the user messes up job dependencies elsewhere, the
issue may go unnoticed.
IJobEntityChunkBeginEnd
doesn’t support a derived interface that uses default
interface methods, because the source generators generate code that directly
calls the methods rather than use a generic static invoker.
I’m listing this here as this is a potential item on my todo list. I shouldn’t have to beat you to it. But you left such a mess with LODs from back when you tried to do hierarchical LODs and then abandoned it that now you don’t even want to touch it again.
I believe Dynamic Buffers should be allocated from a custom ECS-managed
allocator and not using Allocator.Persistent
. The allocations are often small
and would be better suited with a pool. This is starting to become a performance
problem for me during initialization due to all the consecutive small
allocations in my ICleanupBufferElement
types.
These are some of the other issues I have ran into with the Latios Framework that required explicit workarounds that were far from ideal.
We have ICustomBootstrap
for setting up systems at runtime. Why can’t we do
the same thing in the Editor? I ended up extending ECS to do that, but I do it
by accessing some internal Action after the Editor World is created and then try
to replace it. And then I also have a hack to rebuild the EditorWorld
as a
menu option when a buggy editor system goes haywire and the full Editor state is
corrupted. Unfortunately, this hack isn’t bullet-proof and sometimes causes the
wrong world to run an update or two, which then fails and throws errors in the
console.
Then there’s baking. I have a custom Skinned Mesh Rendering solution. Why can’t
I turn off the built-in Entities Graphics Skinned Mesh Renderer baking without
turning off the entire baking of Entities Graphics? Once again, I hacked this by
using a custom baker list mechanism that seemed to be created for tests. I do
this at startup to create a custom bootstrap callback, and then for each baking
world, I have a system in OnCreate
assign a RateManager
to one of the first
ComponentSystemGroups
baking uses, and then in that callback I disable systems
I don’t want and then inject systems with the DisableAutoCreation
attribute.
Why do systems have that attribute? It is because those systems are for an
optional feature that users may or may not want. Why do I use RateManager
? It
is the only way to ensure the already included ComponentSystemGroups
have had
their OnCreate()
called when I inject the systems, because otherwise I can’t
add systems to them.
And while we are on this topic, I would greatly appreciate a flag in
UpdateBefore/After
attributes to suppress warnings about the systems being in
the wrong groups. Such systems might just not be installed at all. A user may
have replaced it with a custom version or something. Bonus points if they can be
suppressed externally.
Personally, I think the whole bottom-up automatic injection design of systems is problematic. It makes it difficult for users to optimize system ordering for better worker thread occupancy, unless they want to decorate their systems with false dependencies. It becomes impossible to know just by looking at the code what the actual order of systems are if there’s a bug where some data is getting changed in the wrong place. And it makes it really hard to copy a system into a different project. Also, how do you define a system to run more than once in a frame?
A top-down approach solves all these problems, and the Latios Framework has the mechanisms in-place to support this. Unfortunately, this conflicts with a lot of existing paradigms. I don’t know the right answer.
Lastly, the whole ICustomBootstrap
thing does not play well with embedded
samples inside of packages. Bootstraps should be settings assets that can be
swapped in the Editor. This feature is planned for a future Latios Framework
version, but I wish I didn’t have to be the one to do it.
Why are collections married to singletons? Why are there even singletons? Do you truly only want one of something, or do you just want to know which entity is the entity? The Latios Framework solves these use cases independently with blackboard entities and collection components. The latter has similar problems as managed structs, except this time user API is fully Burst-compatible. But if you are from Unity and want to do something more official, please reach out to me!
Currently the Latios Framework has this SmartBlobber
mechanism for creating
blob assets in baking systems based on a “request” protocol. For each blob type,
the user has to register the type so that a generic system can properly
ref-count and store blobs in the BlobAssetStore
(deduplicating in the
process).
I currently face two problems that I have hacked around. First, adding concrete
types to BlobAssetStore
is not Burst-compatible. I have to use internal APIs
to precompute the type hash prior to the job. Second, I would much rather add
UnsafeUntypedBlobAssetReference
blobs directly so that I don’t need generics.
Honestly, I think the BlobAssetStore
should use Burst’s type hashes instead of
System.Type.GetHashCode
and expose that as API for working with
UnsafeUntypedBlobAssetReference
.
The Latios Framework Smart Blobbers are a powerful concept. They allow baking
systems to generate blob assets without necessarily knowing nor caring how those
blob assets will be used. User bakers can request blob assets to be created.
Baking systems create the blob assets, then pass the blobs back to the user to
do what they please. The issue is how to pass those blobs back to the user
without making the user write a custom baking system, which is error prone. The
solution I came up with is to create a generic baking system and a “bake item”.
The bake item is a stateful IComponentData
which does the original baking, and
then later receives a callback with a reference to EntityManager
and the
primary entity to resolve any blob asset requests and assign them to components.
This works, but it involves generic systems, and it is still somewhat unsafe.
Ideally, there would be some way to have additional baker callbacks dispatched
by a baking system. And inside these baker callbacks, the baker is only allowed
to change or remove components it added. I’m open for ideas for improvements
and/or alternatives!
Transforms used to be a high severity item. But Pre.65 addressed much of the mess. What we have today is a heavily streamlined and simplified version of Transforms V1’s execution model. And while I believe better can be done (I wrote my own QVVS transform system that I really like while V2 was in chaos), design-wise I think we are finally on the right track again.
But there are some lingering issues with Unity’s implementation that have plagued every iteration of their Transforms. Fortunately, they are easy to fix without any API change, but like come on. Just fix them!
First, LocalToWorld
does not need to be a float4x4
. A float3x4
is
sufficient, and will have better rendering performance. Additionally,
ParentSystem
and LocalToWorldSystem
are both non-deterministic for no good
reason. Both can be made deterministic and faster. And the Child buffer has a
default capacity of 8. Why does that need so much chunk space?
An optional feature of the Latios Framework is a custom scene manager that
automatically destroys runtime-created entities. This scene manager is focused
on actual scenes, not subscenes. And it is designed for such scenes to be
swapped synchronously. It is really important that the subscenes set to
auto-load load synchronously, as there are first-frame-of-scene flags that a lot
of gameplay features like to use when using real scenes and the scene manager.
Right now, my solution for this involves fetching all the entities with
RequestSceneLoaded
, adding the BlockOnStreamIn
flag, and then iterating
through the ResolvedSectionEntity
buffer and adding the same flag to all those
entities. That last part seems unnecessary and wrong. Am I missing something?
A really common use case is to procedurally generate meshes for Mesh Renderers.
While the algorithms work in bakers fine, getting the Mesh Renderer Baker to
accept this and not bake a null mesh or something would be great. Currently, I
replaced the MeshRenderer
baker with a custom version which checks if there is
not a subclass of some other MonoBehaviour
before continuing. If there is, it
leaves it up to that other MonoBehaviour
to do custom baking instead,
providing the custom mesh and list of materials to use for the renderer. You can
see this in action in LSSS, as all the capsules are generated procedurally at
bake time.
Entities Graphics baking is just buggy in general. The baking system adds components that don’t get removed by reversion. RenderMesh works on lists of materials causing lots of defensive reallocations to avoid accidental sharing, only for it to only consider the first material when deduplicating. And if any of the materials are transparent, all the opaque materials on the same mesh get rendered with split batches.
The range MMI mechanism at runtime is actually an excellent and powerful feature. But the baking makes a total mess of it. I personally rewrote the entire baking stack to fix all of this, and everything is amazing again.
Psyshock uses generic jobs in Physics.FindPairs()
using a pattern that allows
Burst to detect and compile the jobs both in the Editor and in builds without
having to explicitly register the generic types with attributes. Unfortunately,
the ILPP can’t pick up on it and patch these jobs to be Burst-schedulable. There
should not be a discrepancy!
Currently I am relying on reflection to find and call the EarlyJobInit()
methods myself for specific generic types.
The whole Skinned Mesh Rendering solution in Entities Graphics is problematic.
It generates GC every frame, it doesn’t scale, and even the public API types of
SkinMatrix
and BlendShapeWeight
fundamentally prevent more efficient
algorithms like the ones Kinemation uses. I’ve been told numerous times that the
skinned mesh rendering design is “experimental”. If that’s the case, why is it
in the released version of Entities Graphics without any guard flags?
I’m only asking this because I have a sliver of hope that Entities Graphics may adopt a design closer to Kinemation in which case I can delegate some features of Kinemation to the official package.
I frequently run into issues where default groups accidentally get added to
other groups if I don’t explicitly remove them from the list. Since these are
systems that Unity will manually create, they should have a
[DisableAutoCreation]
attribute. At least now because they are partial
, I am
able to fix this with asmref.
If you try to get all systems, the systems with the DisableAutoCreation
don’t
get added to the list even if you specify All
like the XML documentation
suggests. I have to use reflection for this now, which sucks.
Is there a more performant way to get the raw blend shapes data (the deltas, not the animated parameters) than queueing up a bunch of async readbacks and then batch-completing them inside a baking system?
I want to bake audio clip samples in blob assets. Current API doesn’t offer any
NativeArray
API. It is slow. Also, if I could get the raw compressed bytes and
compression codec of audio clips, I could do my own decompression at runtime
without having to do my own compression. That would be awesome!
One of the features of blackboard entities in the Latios Framework is that they
merge components of blackboard config entities whenever a subscene containing
them loads. This allows the user to spread config authoring data across multiple
GameObjects
. However, doing this merging at runtime is surprisingly difficult.
While it is easy to get the ComponentType
list to copy from one entity to
another, it is significantly more difficult to actually copy those types. For
unmanaged components, we have the tools now. But for managed components,
especially shared components, it is problematic. Currently, the Latios Framework
uses reflection, but I would love for there to be a proper
EntityManager.AddComponentFromOtherEntity(Entity src, Entity dst, ComponentType ct)
API so that I can Burst-compile this whole thing.
You can get a read-only pointer to components in a chunk, even if the
ComponentTypeHandle
is declared with write access. You cannot do the same for
a BufferAccessor
.
I had a bug where queries weren't being matched because of this. The bug only happened in an editor system while the subscene was open. It was really annoying.
These are little things in the API I think should be improved, but don’t have a major impact on the Latios Framework.
These are a mess. There’s lots of combinations that aren’t supported correctly,
and lots of other cases where the dependencies aren’t brought in correctly.
WithPresent
fixed a lot of issues, but not everything respects that. And there
are still dependency problems, especially with WithAny
on enabled states.
I have large chunk components. Reading/Writing by ref is way faster. I have extensions to do this, but official support would be better.
Similarly, I’ve also noticed ref gaps for EntityManager
.
The biggest issue I have with idiomatic foreach is that it is really clunky for
large queries. With Entities.ForEach
, you could put each argument (type and
variable name) on a different line. That doesn’t really work well with idiomatic
foreach. I recognize this is a hard problem, and I don’t have a proposed
solution yet.
But at the very least, make it so that we can have Entity first in the tuple. It is difficult to articulate why, but the Entity being at the end annoys me and most others I talk to.
As an alternate workflow of idiomatic foreach, I have thought about defining an
IAspect
for each foreach loop. Is there a way to tell an IAspect
to be
hidden in the inspectors?
Other things I want is to be able to iterate chunks without fetching a
NativeArray<ArchetypeChunk>
and to provide a custom query to idiomatic foreach
(or extract the query from it).
Codegen already injects the ComponentTypeHandles
into IJobEntity
. Can we
have an [Inject]
attribute for lookups and Time
to have codegen do the same?
That would reduce a bunch of boilerplate.
I keep finding these, and they always catch me off guard in custom bootstraps. Put them where they belong.
This isn’t possible in idiomatic foreach. I have to have some additional dummy read component around, or deep copy the entity array.
We have to allocate a NativeArray<ArchetypeChunk>
if we want to do this. Can
we get a proper enumerator?
NativeStream doesn't respect alignment, gets its counts messed up when writing piecewise but reading in bulk or vice-versa, can't store writes 4kB or greater, can't defer allocation with a schedule-time known allocation size, ect.