Skip to content

Instantly share code, notes, and snippets.

@Dreaming381
Last active February 22, 2024 19:26
Show Gist options
  • Save Dreaming381/3aec938455f5e6f393a24e4febabd777 to your computer and use it in GitHub Desktop.
Save Dreaming381/3aec938455f5e6f393a24e4febabd777 to your computer and use it in GitHub Desktop.
Continuous Feedback for the ECS Team

Latios Framework Unity ECS Wishlist

Last Updated: 2024-2-22

As the original author and primary developer of the Latios Framework for Unity’s ECS, I regularly run into bugs, inadequate functionalities, and pitfalls within the ECS ecosystem. This has resulted in lots of “hacks” in the framework to patch up problems. This is a living document describing the issues and hacks. My hope is that the Unity ECS team will find this to be a valuable reference to help improve the quality of their packages.

This document is organized into four categories: Ship Stoppers, Feature Blockers and Hinderances, Hacks and Ugliness, and Annoyances, in greatest to least severity respectively.

This document does not include all possible features the Latios Framework may eventually implement if no official solution is provided. If you would like to learn about such features, it is best to ask me.

Ship Stoppers

These are high-severity items that is preventing the Latios Framework from functioning correctly, with no plausible workarounds.

Determinism

Entities 1.2 breaks determinism in a really bad way. Entity IDs are no longer deterministic, meaning the only way to order a list of entities in a deterministic way is to know both which chunks they belong to AND know the indices of those chunks relative to some EntityQuery.

Besides creating a bunch of confusion around whether chunk order determinism even matters (because now it is really hard to preserve) and what the point of the sortKey in ECB is, this new change introduces a major problem in the Latios Framework’s development, which is debugging.

Kinemation relies heavily on chunk components and caching of relationships. When a bug happens, it is crucial to be able to replay the simulation up to the bug to identify the source of the problem. Many of the algorithms Kinemation uses don’t have access to the chunk index and index in chunk for a list of entities collected in parallel. There is no way to order them deterministically in 1.2. That means that chunk order is not preserved, and whether or not two entities lie in the same chunk may change run to run. Thus if the bug was dependent on two entities being in the same or different chunks, the bug is only reproducible by chance. That’s really, really bad.

Full determinism per architecture was one of Unity’s major competitive advantages over other solutions. And now it is being thrown away. Sure, there’s the argument that streaming breaks it, but streaming could also pre-allocate the entities a subscene needs and assign them all a shared component to the loading subscene until the subscene is loaded and the chunks are swapped. If that swap step could also be manually triggered, then subscene streaming could even support lock-step.

Unity is going to have to offer significantly more value to compensate for this regression, otherwise I will probably stay at 1.1.

Feature Blockers and Hinderances

These are items which are preventing or creating unnecessary friction for specific new features or optimizations of the Latios Framework from being developed. They may also be creating undesirable effects on usage of the framework and reducing quality on the overall solution.

IAspect Feature Gaps

Aspect lookups are not only difficult to discover, but using them in IJobEntity requires a ton of boilerplate compared to ComponentLookup. There’s no SystemAPI methods for getting auto-cached handles or lookups.

For performance reasons, the Latios Framework is starting to have really complicated sets of components. A common example is that the Latios Framework triple-buffers animation data so that things like motion vectors and inertial blending can be easily evaluated. Yet rather than copy current to previous and previous to two-ago, these buffers rely on control components to rotate the roles. This behavior should be somewhat abstracted from the user, and IAspect solves this case beautifully. Unfortunately, users struggle to acquire such aspects from random entities in jobs.

SystemAPI Extensibility

The fact that we can’t use SystemAPI in static methods makes it extremely difficult to build extensions and common patterns. For example, I have a static method Physics.BuildCollisionLayer() that needs to schedule 5 jobs in sequence. While there are several variants, one variant requires the first job to perform chunk iteration. Securing such type handles is extremely problematic. The user has to manually cache and update a struct containing those handles, because this method can’t rely on SystemAPI. That’s a lot of unnecessary boilerplate burdened directly on the user.

Codegen Accessibility

Source generators are an incredibly powerful tool. The sad part is that up until recently, the Latios Framework never used it. Why? Because there’s no documentation on how to use it to solve ECS-specific problems. Sure, in the manual you can find a page how to set up source generators using older Roslyn, but a more modern tutorial would have solved a lot of problems the Latios Framework faced a lot sooner. Unity now supports Roslyn 4.0 and incremental source generators, and the Latios Framework is using exactly that to add new IComponentData to partial structs implementing specific interfaces. The workflow is surprisingly good once you understand it. A little documentation on how to do some common things to mitigate the use of runtime reflection and generics would go a long way.

However, not every problem is solved. The IAspect and SystemAPI issues could almost be solved by users if there was a way to add additional OnCreateForCompiler methods to systems. This could probably be done by decorating custom methods that should run at that time point with an attribute and having ILPP pick up on them.

Subscene Imports

Subscene import workflows have significant usability issues. Because they occur in a separate Unity process, they do not use Burst, cannot easily be debugged, have limited reporting capabilities of memory leaks and the like, and many engine features are not well tested when accessed in this mode (it took 3 years for the audio crash bug to be fixed).

The Latios Framework pushes the boundaries of what can be baked, with new and exciting high-level features. But that only works when baking itself works, which has been a constant pain point.

Why can’t I Get Interfaces in Bakers?

In MonoBehaviours, you can do GetComponent<ISomeInterface>(). In bakers, this isn’t possible. Why?

Most of the time, I want a baker to check if some interface exists on the same Game Object, and if so, early out so that another Baker that processes the interface can work unhindered.

BlobStrings

I started using FixedString and BlobArray<byte> in blobs because I couldn’t log BlobStrings in Burst-compiled code. There’s a lot of missing APIs and features for BlobStrings. Make them better so that I can be more efficient with my data.

Nonzero Results of UnsafeUtility.MemCmp

I have a task where I have M arrays of bytes and a separate N array of bytes. For each array in M, I need to find an array in N that starts with all the bytes in the array from M. Currently, I’m using UnsafeUtility.MemCmp in O(n^2) fashion. But I believe that sorting M and N by raw byte values would lead to a faster algorithm. But can MemCmp be used for this kind of sorting? Is there a better approach?

Ending Writes to Graphics Buffers

A huge optimization I made with Kinemation’s renderer is writing to Graphics Buffers and dispatching compute shaders inside the culling loop instead of before it. This way I take culling results into account and do significantly less work. This applies to skinning, material properties, blend shapes, other mesh deformations, and whatever else. Only problem is that now I have to complete culling jobs inside the culling callbacks so that I can end GraphicsBuffer writes and dispatch the compute shaders. It would be awesome if BatchRendererGroup could get an additional callback when the jobs need to be completed so that I could do these compute shader dispatches as late as possible. SRP shenanigans are a big chunk of my frame time and the worker threads are starved.

Large Burst Jobs with Lots of Functions Difficult to Navigate

It would be awesome if I could see an outline of all functions a Burst job compiled and jump between them. Right now it is still difficult to understand what is happening in critical sections of code in massive jobs.

No NativeArray.DisposeJob for CollectionHelper

Why does this not exist?

Adding/Removing ComponentTypeSet containing chunk components to a NativeArray<Entity>

This triggers an error if the entity array creates a batch. It is really annoying because then I have to handle chunk components separately, which is an extra structural change.

Can’t Add and Remove Components in a Single Structural Change

I have to move an entity, or more often an array of entities twice if I have both a set of components to add and a set of components to remove.

Adding/Removing Tag Components Causes Fragmentation

If a zero-sized component is added or removed on all entities in the chunk, the chunk’s archetype is converted in-place, which is a great optimization. However, when there are only a couple of entities in the chunk because the source archetype represents a temporary state, then this conversion in-place will leave lots of chunks with only a small number of entities each, causing fragmentation. It would be awesome if as an additional check, if there is another chunk that can accommodate the entire existing chunk, that the entities move to the new chunk rather than perform the in-place conversion.

Most of the time, I find myself using sub-optimal structural change sequences just to avoid this edge case.

Thread-Local Bump and Stack Allocators

I have a job where there is a loop, and inside that loop is a callstack where each stack frame has logic that needs to perform allocations. The allocations add up fast, but usually have very short lifecycles and are mutually exclusive to each other. I would love special allocators that can be rewound directly in the job to recycle the memory efficiently.

WorldUpdateAllocator in Baking Systems

WorldUpdateAllocator doesn’t get rewound in baking worlds. Therefore, we have to use TempJob allocations everywhere when baking.

Optional Containers in Jobs

Sometimes, I really want to specify that a container in a job is meant to be allocated in the job via Allocator.Temp, without having to use the nuclear attribute [NativeDisableContainerSafetyRestriction]. Other times, I might have a job that takes a variable number of DynamicComponentTypeHandles, and I never know what to populate the unused slots with. Again, disabling container safety is really bad because then if the user messes up job dependencies elsewhere, the issue may go unnoticed.

IJobEntityChunkBeginEnd with Default Methods

IJobEntityChunkBeginEnd doesn’t support a derived interface that uses default interface methods, because the source generators generate code that directly calls the methods rather than use a generic static invoker.

Where is LOD Crossfade?

I’m listing this here as this is a potential item on my todo list. I shouldn’t have to beat you to it. But you left such a mess with LODs from back when you tried to do hierarchical LODs and then abandoned it that now you don’t even want to touch it again.

DynamicBuffer Allocations

I believe Dynamic Buffers should be allocated from a custom ECS-managed allocator and not using Allocator.Persistent. The allocations are often small and would be better suited with a pool. This is starting to become a performance problem for me during initialization due to all the consecutive small allocations in my ICleanupBufferElement types.

Hacks and Ugliness

These are some of the other issues I have ran into with the Latios Framework that required explicit workarounds that were far from ideal.

Bootstraps and Baking Customizations

We have ICustomBootstrap for setting up systems at runtime. Why can’t we do the same thing in the Editor? I ended up extending ECS to do that, but I do it by accessing some internal Action after the Editor World is created and then try to replace it. And then I also have a hack to rebuild the EditorWorld as a menu option when a buggy editor system goes haywire and the full Editor state is corrupted. Unfortunately, this hack isn’t bullet-proof and sometimes causes the wrong world to run an update or two, which then fails and throws errors in the console.

Then there’s baking. I have a custom Skinned Mesh Rendering solution. Why can’t I turn off the built-in Entities Graphics Skinned Mesh Renderer baking without turning off the entire baking of Entities Graphics? Once again, I hacked this by using a custom baker list mechanism that seemed to be created for tests. I do this at startup to create a custom bootstrap callback, and then for each baking world, I have a system in OnCreate assign a RateManager to one of the first ComponentSystemGroups baking uses, and then in that callback I disable systems I don’t want and then inject systems with the DisableAutoCreation attribute. Why do systems have that attribute? It is because those systems are for an optional feature that users may or may not want. Why do I use RateManager? It is the only way to ensure the already included ComponentSystemGroups have had their OnCreate() called when I inject the systems, because otherwise I can’t add systems to them.

And while we are on this topic, I would greatly appreciate a flag in UpdateBefore/After attributes to suppress warnings about the systems being in the wrong groups. Such systems might just not be installed at all. A user may have replaced it with a custom version or something. Bonus points if they can be suppressed externally.

Personally, I think the whole bottom-up automatic injection design of systems is problematic. It makes it difficult for users to optimize system ordering for better worker thread occupancy, unless they want to decorate their systems with false dependencies. It becomes impossible to know just by looking at the code what the actual order of systems are if there’s a bug where some data is getting changed in the wrong place. And it makes it really hard to copy a system into a different project. Also, how do you define a system to run more than once in a frame?

A top-down approach solves all these problems, and the Latios Framework has the mechanisms in-place to support this. Unfortunately, this conflicts with a lot of existing paradigms. I don’t know the right answer.

Lastly, the whole ICustomBootstrap thing does not play well with embedded samples inside of packages. Bootstraps should be settings assets that can be swapped in the Editor. This feature is planned for a future Latios Framework version, but I wish I didn’t have to be the one to do it.

Collections in Components

Why are collections married to singletons? Why are there even singletons? Do you truly only want one of something, or do you just want to know which entity is the entity? The Latios Framework solves these use cases independently with blackboard entities and collection components. The latter has similar problems as managed structs, except this time user API is fully Burst-compatible. But if you are from Unity and want to do something more official, please reach out to me!

BlobAssetStore

Currently the Latios Framework has this SmartBlobber mechanism for creating blob assets in baking systems based on a “request” protocol. For each blob type, the user has to register the type so that a generic system can properly ref-count and store blobs in the BlobAssetStore (deduplicating in the process).

I currently face two problems that I have hacked around. First, adding concrete types to BlobAssetStore is not Burst-compatible. I have to use internal APIs to precompute the type hash prior to the job. Second, I would much rather add UnsafeUntypedBlobAssetReference blobs directly so that I don’t need generics. Honestly, I think the BlobAssetStore should use Burst’s type hashes instead of System.Type.GetHashCode and expose that as API for working with UnsafeUntypedBlobAssetReference.

Smart Bakers Alternative?

The Latios Framework Smart Blobbers are a powerful concept. They allow baking systems to generate blob assets without necessarily knowing nor caring how those blob assets will be used. User bakers can request blob assets to be created. Baking systems create the blob assets, then pass the blobs back to the user to do what they please. The issue is how to pass those blobs back to the user without making the user write a custom baking system, which is error prone. The solution I came up with is to create a generic baking system and a “bake item”. The bake item is a stateful IComponentData which does the original baking, and then later receives a callback with a reference to EntityManager and the primary entity to resolve any blob asset requests and assign them to components. This works, but it involves generic systems, and it is still somewhat unsafe. Ideally, there would be some way to have additional baker callbacks dispatched by a baking system. And inside these baker callbacks, the baker is only allowed to change or remove components it added. I’m open for ideas for improvements and/or alternatives!

Transforms

Transforms used to be a high severity item. But Pre.65 addressed much of the mess. What we have today is a heavily streamlined and simplified version of Transforms V1’s execution model. And while I believe better can be done (I wrote my own QVVS transform system that I really like while V2 was in chaos), design-wise I think we are finally on the right track again.

But there are some lingering issues with Unity’s implementation that have plagued every iteration of their Transforms. Fortunately, they are easy to fix without any API change, but like come on. Just fix them!

First, LocalToWorld does not need to be a float4x4. A float3x4 is sufficient, and will have better rendering performance. Additionally, ParentSystem and LocalToWorldSystem are both non-deterministic for no good reason. Both can be made deterministic and faster. And the Child buffer has a default capacity of 8. Why does that need so much chunk space?

Auto-Load Subscenes Synchronously?

An optional feature of the Latios Framework is a custom scene manager that automatically destroys runtime-created entities. This scene manager is focused on actual scenes, not subscenes. And it is designed for such scenes to be swapped synchronously. It is really important that the subscenes set to auto-load load synchronously, as there are first-frame-of-scene flags that a lot of gameplay features like to use when using real scenes and the scene manager. Right now, my solution for this involves fetching all the entities with RequestSceneLoaded, adding the BlockOnStreamIn flag, and then iterating through the ResolvedSectionEntity buffer and adding the same flag to all those entities. That last part seems unnecessary and wrong. Am I missing something?

Procedural Meshes in Baking

A really common use case is to procedurally generate meshes for Mesh Renderers. While the algorithms work in bakers fine, getting the Mesh Renderer Baker to accept this and not bake a null mesh or something would be great. Currently, I replaced the MeshRenderer baker with a custom version which checks if there is not a subclass of some other MonoBehaviour before continuing. If there is, it leaves it up to that other MonoBehaviour to do custom baking instead, providing the custom mesh and list of materials to use for the renderer. You can see this in action in LSSS, as all the capsules are generated procedurally at bake time.

Entities Graphics Baking is Buggy

Entities Graphics baking is just buggy in general. The baking system adds components that don’t get removed by reversion. RenderMesh works on lists of materials causing lots of defensive reallocations to avoid accidental sharing, only for it to only consider the first material when deduplicating. And if any of the materials are transparent, all the opaque materials on the same mesh get rendered with split batches.

The range MMI mechanism at runtime is actually an excellent and powerful feature. But the baking makes a total mess of it. I personally rewrote the entire baking stack to fix all of this, and everything is amazing again.

Burst Generic Jobs Can’t Be Scheduled in Burst

Psyshock uses generic jobs in Physics.FindPairs() using a pattern that allows Burst to detect and compile the jobs both in the Editor and in builds without having to explicitly register the generic types with attributes. Unfortunately, the ILPP can’t pick up on it and patch these jobs to be Burst-schedulable. There should not be a discrepancy!

Currently I am relying on reflection to find and call the EarlyJobInit() methods myself for specific generic types.

Experimental Skinned Mesh Rendering

The whole Skinned Mesh Rendering solution in Entities Graphics is problematic. It generates GC every frame, it doesn’t scale, and even the public API types of SkinMatrix and BlendShapeWeight fundamentally prevent more efficient algorithms like the ones Kinemation uses. I’ve been told numerous times that the skinned mesh rendering design is “experimental”. If that’s the case, why is it in the released version of Entities Graphics without any guard flags?

I’m only asking this because I have a sliver of hope that Entities Graphics may adopt a design closer to Kinemation in which case I can delegate some features of Kinemation to the official package.

Default Groups

I frequently run into issues where default groups accidentally get added to other groups if I don’t explicitly remove them from the list. Since these are systems that Unity will manually create, they should have a [DisableAutoCreation] attribute. At least now because they are partial, I am able to fix this with asmref.

TypeManager.GetAllSystems [DisableAutoCreation] Bug

If you try to get all systems, the systems with the DisableAutoCreation don’t get added to the list even if you specify All like the XML documentation suggests. I have to use reflection for this now, which sucks.

Why Do I Need a GraphicsBuffer to Get Blend Shapes?

Is there a more performant way to get the raw blend shapes data (the deltas, not the animated parameters) than queueing up a bunch of async readbacks and then batch-completing them inside a baking system?

Why is there No Burst-Compatible Way to Read Audio Clips?

I want to bake audio clip samples in blob assets. Current API doesn’t offer any NativeArray API. It is slow. Also, if I could get the raw compressed bytes and compression codec of audio clips, I could do my own decompression at runtime without having to do my own compression. That would be awesome!

Copying Shared Components Type-Agnostically

One of the features of blackboard entities in the Latios Framework is that they merge components of blackboard config entities whenever a subscene containing them loads. This allows the user to spread config authoring data across multiple GameObjects. However, doing this merging at runtime is surprisingly difficult. While it is easy to get the ComponentType list to copy from one entity to another, it is significantly more difficult to actually copy those types. For unmanaged components, we have the tools now. But for managed components, especially shared components, it is problematic. Currently, the Latios Framework uses reflection, but I would love for there to be a proper EntityManager.AddComponentFromOtherEntity(Entity src, Entity dst, ComponentType ct) API so that I can Burst-compile this whole thing.

BufferAccessor Missing Flexibility

You can get a read-only pointer to components in a chunk, even if the ComponentTypeHandle is declared with write access. You cannot do the same for a BufferAccessor.

Incremental Baking Discards Chunk Components

I had a bug where queries weren't being matched because of this. The bug only happened in an editor system while the subscene was open. It was really annoying.

Annoyances

These are little things in the API I think should be improved, but don’t have a major impact on the Latios Framework.

Entity Queries with Enabled Components

These are a mess. There’s lots of combinations that aren’t supported correctly, and lots of other cases where the dependencies aren’t brought in correctly. WithPresent fixed a lot of issues, but not everything respects that. And there are still dependency problems, especially with WithAny on enabled states.

Missing Ref APIs

I have large chunk components. Reading/Writing by ref is way faster. I have extensions to do this, but official support would be better.

Similarly, I’ve also noticed ref gaps for EntityManager.

Idiomatic Foreach is Insufficient

The biggest issue I have with idiomatic foreach is that it is really clunky for large queries. With Entities.ForEach, you could put each argument (type and variable name) on a different line. That doesn’t really work well with idiomatic foreach. I recognize this is a hard problem, and I don’t have a proposed solution yet.

But at the very least, make it so that we can have Entity first in the tuple. It is difficult to articulate why, but the Entity being at the end annoys me and most others I talk to.

As an alternate workflow of idiomatic foreach, I have thought about defining an IAspect for each foreach loop. Is there a way to tell an IAspect to be hidden in the inspectors?

Other things I want is to be able to iterate chunks without fetching a NativeArray<ArchetypeChunk> and to provide a custom query to idiomatic foreach (or extract the query from it).

Lookups in IJobEntity

Codegen already injects the ComponentTypeHandles into IJobEntity. Can we have an [Inject] attribute for lookups and Time to have codegen do the same? That would reduce a bunch of boilerplate.

Unity Systems Outside Unity Namespace

I keep finding these, and they always catch me off guard in custom bootstraps. Put them where they belong.

Iterating Just Entities in a Query

This isn’t possible in idiomatic foreach. I have to have some additional dummy read component around, or deep copy the entity array.

Iterating Chunks in a Query

We have to allocate a NativeArray<ArchetypeChunk> if we want to do this. Can we get a proper enumerator?

NativeStream Woes

NativeStream doesn't respect alignment, gets its counts messed up when writing piecewise but reading in bulk or vice-versa, can't store writes 4kB or greater, can't defer allocation with a schedule-time known allocation size, ect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment