Proposed blit component for avatars and worlds to use in VRChat
Canny post made requesting this to be implemented: https://vrchat.canny.io/feature-requests/p/graphicsblit-scripts
This is an update to the original blit script request since there were concerns that the original request was underspecified and people would ask for more functionality from it after implementation. This script has been updated as such to allow a majority of the relevant functionality that shader developers were interested in after the original basic blit scripts were published.
These scripts have been tested and confirmed to work completely in Unity 2018.4, 2019.3, and 2020.1.0a.
If you have any questions, requests, or feedback, please open an issue on this GitHub or contact me through Discord at Merlin#0001.
- Blit Component Specification
- Table of contents
- Proposed features
- Rationale
- VRChat-specific considerations
- Example Scenes
These features are implemented in the current example scripts and have example scenes that cover basic use cases of each of these features.
Major changes in functionality and features have #defines for each of them at the top of the BlitComponent and BlitController. It will probably be easier to read the code if you choose a set of defines and remove the code that is not active for that set of defines.
These are things that should be considered required in some form in order for this to be useful to people.
Allow specification of a target render texture, and optionally a source texture to reduce the number of materials needed for copying.
This passes the transforms of the object that the blit script is attached to into the shader. In this implementation you are provided with _BlitToWorldMatrix
, _BlitToLocalMatrix
, and _BlitWorldPosition
. This is necessary for many interactive applications.
This is why the BlitComponent requires a renderer on the object it is attached to. This uses the fact that Unity implements material property animations as a property block which is set on the material.
If you animate material properties on the attached renderer, the blit component will copy those properties to the blit invocation. There are two implemented mechanics for this that get switched between using the GRAPHICS_BLIT_USE_COMMAND_BUFFER
define.
The default behaviour is to have GRAPHICS_BLIT_USE_COMMAND_BUFFER
enabled since it is much easier and less error prone than not having it enabled. With this enabled, the blit functionality is eulated using a command buffer that uses DrawRenderer(). This is because you cannot pass a Property Block directly into a Graphics.Blit() call.
With GRAPHICS_BLIT_USE_COMMAND_BUFFER
disabled, this uses Graphics.Blit internally, and the user must specify a list of properties that are animated and need to be copied since Unity provides no way to view the properties set on a property block. This has the other downside that while testing in editor the animated properties will be set on the material asset and will persist out of play mode.
These are things that are optimizations for specific use cases or quality of life features.
This allows the user to specify a list of transforms that get passed into the shader with the float4x4 _BlitToWorldMatrixArray[]
and float4x4 _BlitToLocalMatrixArray[]
uniforms. This functionality can be handled by having a blit component for each transform that needs to be written, but it can be cumbersome to do this and has more overhead.
If GRAPHICS_BLIT_SUPPORT_CUBE_3D
is enabled, you can use Cube and 3D render textures for the target render texture. If one of these types of render textures is used, the x component of the _BlitPassInfo
uniform will specify the current slice that is being rendered to.
This is supported when GRAPHICS_BLIT_USE_COMMAND_BUFFER
is enabled. The mesh particle emission uses this on a skinned mesh to record vertex positions and motion. To work properly on skinned meshes, this uses a new flag forceMatrixRecalculationPerRender
that only exists in >=2018.3 that is required for the skinned mesh to update properly in time for the blit. Without using the flag, the skinned mesh data will be from the last frame which isn't acceptable for many use cases.
There are a number of reasons that we originally proposed this script, and there are some new reasons that have only become clear after we have been able to use Custom Render Textures. We still think that this is a reasonable and useful request.
Custom Render Textures (CRT's) are cool in theory, but in practice they are a black box thing that executes logic from an asset file. CRT's don't serve any particular purpose in an environment where you could script since the same behavior and feature set could be implemented in a few hundred lines and would be much less liable to break.
The original blit script request was written when VRChat was on Unity 5.6. While we did not make it clear, though we should have since CRT's were incorrectly touted as a magical in-engine solution by Unity that would solve everything, we were well aware that custom render textures existed in the next unity version (2017.4) and the original request was made despite that. This was because of two major reasons:
The first reason was that CRT's are a niche Unity black box asset. If things break with them, the best that VRC could do is attempt to work around the issues and make a bug report with Unity that may or may not be fixed in some number of years.
The second and more important reason was that it was already clear that CRT's expected you to be able to script to set any kind of external data in them. Since they were treated as assets there was no performant way to pass data into the materials that they used. So for instance, you couldn't tell them about the location of your hands for basic GPU particles without the use of a Camera component, or whether the system was turned on.
The first reason ended being a real issue once we actually got to Unity 2017.4 and quickly realized that CRTs placed on avatars would stop updating under various conditions, starting with https://vrchat.canny.io/bug-reports/p/custom-render-textures-bugged-on-avatars, then in later patches https://vrchat.canny.io/bug-reports/p/custom-render-textures-break-when-opening-menus. This made them not particularly useful on avatars since it required you to have the avatar cleared from cache to work.
Luckily they work in worlds correctly, you might say. But they have fundamental flaws in their design that are at odds with using them effectively for anything other than the simple single CRT examples or systems that are solely feed forward systems. I wasted 2 days attempting to get Custom Render Textures to function for my fluid simulation world, but ended up reimplementing the system in a few hours with cameras since CRTs constantly broke the systems in unexpected ways.
The root of their design flaws is that they cannot handle cyclic dependencies at all. CRT's boast the ability to automagically figure out the correct update order for a series of CRT's that depend on the results of each other. The caveat is that they cannot handle cyclic dependencies catastrophically. If you have any cyclic dependency in updates, Unity will spam errors in the editor and will shuffle the update order of the CRT's seemingly randomly in game. This is a massive issue since the main use of them, in my opinion, is to handle simulations. Simulations are very frequently recurrent and have cyclic dependencies on the previous frame's update results, and for anything remotely complex, they require more than 1 render texture pass so the double buffered flag does not help. I hoped that using OnDemand mode and manually calling update on them would allow you to specify a manual update order, but all that calling update on a CRT does is tell Unity that the update needs to run at some point in the current frame, in an order determined automatically by Unity's dependency finding.
The other issue is more of a problem from the performance side. While CRT's provide a way to call n updates per frame, they double buffer every update. This is the other main issue I didn't use them for my fluid simulation. It may not be immediately clear why this is an issue or that it is entirely avoidable if you're not familiar with how the GPU works or how simulation stuff is usually handled on the GPU. Unless you are using Unordered Access Views(UAV's), you cannot modify data in a texture at the same time that you read from it. In Unity UAV's are implemented using a Render Texture with the flag enableRandomWrite enabled and binding it with Graphics.SetRandomWriteTarget() or one of the other eqivilent functions for command lists. In order to work around the limitation, old game engines without access to compute shaders and Unity traditionally will copy the updated contents of a render texture into a second one so that you can read from the second one while you're writing to the main render texture. This is referred to as double buffering.
This is perfectly fine if you want to run one update on each buffer where before or after it gets updated, it copies to a second buffer for the next update to use. The issue is when you want to do multiple updates. For the fluid simulation updates I needed to do 10-20 diffusion updates for some parts of it. The problem is that with the way that CRT's currently handle sequential updates of the same buffer. If I wanted to do 20 updates of the same buffer in sequence, then Unity needs to copy the buffer 20 times as well. Those 20 copies are completely unnecessary. Unity's documentation implies that this may be fixed at some point, but it has not fixed it as of 2020.1 from what I can tell. The reason those copies are unnecessary is that you can do what some things refer to as ping-pong double buffering when you have a sequence of updates. This means that you swap the target and source render texture during each update. So if I wanted to update a buffer 4 times, using Unity's CRT update handling the sequence would look like this: Update -> copy -> update -> copy -> update -> copy -> update -> copy
Using ping ponging, the update order instead looks like this with A and B being the two buffers that exist for double buffering: Update A to B -> update B to A -> update A to B -> update B to A
. What this is doing, is using two buffers to handle double buffering by swapping them between each update. So you always get the last pass's data without needing to copy it around redundantly. For the fluid simulation specifically, this was a massive issue since the updates are entirely limited by bandwidth. The actual calculations are so simple that they are not measurable. So for no particular reason Unity makes updating the fluid simulation takes nearly twice as long on the GPU by copying the buffers around redundantly.
Of course if we had compute shaders and readWrite texture binding, we could do many of these updates in place for slightly better performance and half the VRAM usage. But that is not what this canny is for since the blit component is intended for use on avatar as well as worlds.
There are other issues with them that probably can't trivially be fixed from VRC's side and I won't go into detail here, so message me on Discord if you want more details.
Currently, the Unity Camera component is the most controllable and robust method that we have for storing data on the GPU between frames. However, VRC has left it in a weird, one-of-a-kind, limbo for filtering where only your friends are able to see the camera component update, and they have no way to turn it off without blocking your avatar. The camera component is not usually intended for doing simulations in shaders. But it has been adapted for that by the community, similar to how older -- more limited engines and rendering APIs used fake cameras for post processing or stuff like shallow water equation simulations. I'd argue that the camera components should still have a place in the performance settings whenever we get them since they have important use cases that the blit components don't cover, but the blit component is a more narrow use case that can be much more optimized and deserves a place of its own with that consideration.
The friends-only camera filtering is also not super convenient since there is some issue with caching that means that if you friend someone who you want to show camera-dependent stuff to, if they have already loaded your avatar with the cameras on it, they will need to restart their game for the cameras to work.
Most camera-based systems are also necessarily built on using the UiMenu layer for culling objects that need to only be visible to the camera. Since I made the original GPU particles using cameras on avatars, people have used the UiMenu layer to prevent the cameras that handle updates for their simulations from rendering objects unnecessarily. In VRC, usually all objects under your avatar hierarchy will be moved into either the PlayerLocal layer if the avatar is loaded locally, or onto the Player layer if the avatar is loaded remotely. The UiMenu is the one exception to this, any objects on this layer will not get reassigned to the Player or PlayerLocal layer. This was not only due to performance, but also a measure made to prevent crashing. Prior to Unity 2017, if you executed a grab pass on a texture format with 128 bit depth (ARGB Int or ARGB Float), the game would instantly hard crash. This was particularly bad because portals would often cull into the view of an update camera when they had grab passes, and this would crash you. This was fixed in the current version of 2017 that VRC uses so the crashing issue isn't a problem now, though performance is still a big concern. Unity has optimizations to cull layers the the camera isn't viewing, but if UiMenu were removed from avatars, the cameras would need to cull against every avatar which could take upwards of 0.4ms extra per camera on the CPU. Even with using UIMenu, cameras still have a flat overhead of somewhere on the order of 0.3-0.4ms that can be avoided for the most part with the blit scripts.
I bring this up here because it's undocumented functionality that people depend on knowingly or unknowingly. A number of my systems would need to be entirely refactored if UiMenu disappeared from avatars. And QuantumHero's gpu particle asset, which is the most popular use of avatar cameras to my knowledge has a YouTube tutorial video about how to configure it. At the point of writing this, that video alone has 13,000 views. As with most tutorial videos of its nature, you can probably assume that is a rough estimate for the number of people who have attempted to set it up. This is not counting people who have cloned avatars that use the system which probably outnumber the views on the video vastly. If UiMenu was removed without handling for patching old avatars, it's possible that 10's of thousands of people would be affected. This isn't just suppositions about "what if VRC did that." It was nearly removed a few months ago in an attempt to remove the self avatar stations, this is the relevant canny https://vrchat.canny.io/feature-requests/p/allow-sitting-in-your-own-station/. Luckily we had the opportunity to tell the ones removing the self stations about the unintended consequences if UiMenu was removed without special handling before it was pushed, so they were able to remove self stations properly without relying on brittle layer checks. But they told us at the time that UiMenu is still liable to be removed in the future, so blit scripts are a viable alternative to that before the potential removal. This is an opportunity to not shoot first and ask questions later, the blit scripts could be implemented many months or years before UiMenu needs to be removed from avatars. By then, due to the benefits of using it over cameras should have had most people move over to blit script based systems. And people who haven't switched could be pointed to up to date systems instead of just leaving people in the cold like many of these removal patches have in the past.
Obviously since cameras currently only work for friends, a massive point, and the biggest point for some people of having the blit scripts on avatars is that it could be filtered via the performance and safety systems instead of being gated behind friends. It would probably fall under Shaders if you were to use the current safety system categories. And would be a thing of its own in the performance system.
While it's great that we still have cameras to work with, I've seen first hand how having them gated behind friends has stifled creativity using cameras within the community. There are numerous shader developers I've taught to use cameras for running simulations and logic, but most of them have stopped or gotten bored of it because they can't easily show off to strangers, and showing stuff to their friends group gets old quickly. Saying stuff along the lines of "but they could just add them as a friend" discounts the much more important interactions where someone is showing off something to one person, and other people come to observe. Due to the bug with avatar caching, this interaction is nearly-non existent because in order for it to happen, you need to friend the person and they need to rejoin. This is too much of an effort for many people. And it requires a catalyst of already having some people in the room friended. You can't just walk into a room and show how your work has paid off.
The original proposal was made more than a year ago, expecting Udon to be something for far off in the future. Since Udon is being released for worlds soon, and will at some point have Graphics.Blit() what is the point of having this for worlds? I'd argue that being able to run 50+ blits in time on the order of tenths of a millisecond would be good, as opposed to multiple milliseconds which is where Udon performance is at the moment. Of course once major issues have been resolved and features have been implemented, I'm sure there's some room for improvement on Udon performance.
But given that performance will be improved over time, the original point of blit scripts for worlds will be satisfied by having Graphics.Blit for Udon. However the original graphics blit proposal was made expecting Udon to come a long time since then, and was just a stop-gap for worlds until Udon. For me, the end game for worlds was always to have the capability to execute compute shader kernels and bind the relevant resources via Udon. That is largely a separate thing from the blit scripts, so I'll leave it at that with the canny https://vrchat.canny.io/vrchat-udon-closed-alpha-feedback/p/add-nodes-for-compute-shaders
Avatars are a different story. There are rumors that Udon will be supported on avatars at some capacity in the avatar 2.0 stuff that will be released at some nebulous point in the future. This section assumes that avatar 2.0 does support Udon, and that it miraculously exposes the Graphics.Blit function and material parameter setters that would be needed. If avatar 2.0 doesn't support Udon, then the blit scripts should be part of the avatar 2.0 patch in my opinion, or put in even earlier since they are independent from most systems.
Provided that Udon is supported on avatars, regardless of how it functions, I'd argue that the blit scripts should be separate from a performance standpoint since they primarily operate on the GPU, and any Udon equivalent would likely have somewhat more CPU overhead than having the scripts as part of the client in IL2CPP optimized code. If Udon does support graphics.blit on avatars, it would still be nice to have it regardless of if the blit scripts get implemented, since there may be some case where using Udon for blitting is necessary, though I've done by best with v2 to cover most of the possible use cases that blit would be able to.
This is the speculation corner where I speculate on ways that Udon might work on avatars and potential issues with those ways that should be taken into consideration if it actually works in those ways.
Since it's difficult to measure the time a given Udon script may take, and since the Udon VM already has an execution timeout, it's possible that Udon scripts for avatars could have a much much lower timeout than world scripts. If you want to only allow blit script functionality through Udon for avatars, or any functionality really and have a timeout, I'd argue that you should have some minimum number of instructions that are guaranteed to execute every frame regardless of timeout that are critical to things working correctly. For instance, many of my blit scripts rely on using local coordinate spaces for storing data as an optimization. The entire coordinate space is shifted every frame to follow my avatar and the mesh that is attached to it which shows what it contains. If script execution randomly times out on some low end systems after a couple of function calls, then that will randomly break my system and cause it to jitter around. Similarly, if some Udon script moves and object to your hand position every frame or something, then that needs to obviously run every frame or the object will become disconnected.
If Udon on avatars were to run on a separate execution thread, then obviously you'd need to make sure blits are synced, or make sure that they work from an async context. And that brings in the question of what if the blit update happens out of step with the frame, skips a frame, runs multiple updates in a single frame, etc.
Anyways I can only speculate, and there's not much point in it until udon is confirmed or denied for avatar 2.0. But regardless of speculation of how Udon would work on avatars, I think that there is a case to be made for allowing blit scripts separately from it from a performance standpoint.
This script allows much lower CPU overhead for using render texture loops to run stateful logic in shaders for avatars and worlds. It is approximately 10x as fast as using Camera components on the CPU side since blit does not need to run culling an a number of other things. This brings each pass from ~0.3ms to ~0.03ms. Using the command list version has a small flat overhead (~0.05ms for my system) on starting the command list, so I would recommend batching all of the updates of the controller into a single command list invocation for updates of blit controllers that aren't dependant on cameras renders.
Some basic metrics have been included at the bottom of BlitController.cs
for consideration with rough recommendations for corresponding effects on performance. Of course these metrics aren't inclusive of the performance for the shaders being executed. See the definitions of the metrics in the code for more detail.
The metrics, some of these are not applicable depending on what is enabled/disabled:
- Total blit count
- Blit GPU bandwidth
- Animated parameter count (only relevant if not in command buffer mode)
- Reference transform count (only relevant if reference transforms are enabled)
All of these metrics return ints or floats and have some rough performance metrics I took, so if you decide to extend the performance system into a proper scoring system in the future instead of taking the worst performing thing for performance rating, it should be easy to convert to scores. The example scenes can also be modified for quick performance tests.
The performance system still doesn't/can't tell you that someone has a grab pass in their shader because Unity doesn't expose that info and you don't have an easy way to change that since you don't have the engine source. Grab passes are a larger hit on performance than most of the things you'd see directly using the blit scripts. For instance, playing the game at 4k results in each grab pass taking 0.7ms. This is obviously worse when you're playing in VR with supersampling. This is multiplied by mirrors and any cameras in the world. So it quickly gets out of hand when you have multiple people with grab passes sitting around. My unoptimized, basic GPU particle sample implementation included in this repository takes 0.32ms to update 1 million particles. With properly optimized implementations like the ones I usually use, this time is halved to ~0.16ms since everything to describe a particle can fit into 1 pixel instead of 2. Of course, the rendering of the actual 1 million particle mesh takes the vast majority of time in this case, where it can take >4ms if you cover your face with it depending on the scale of the particles. But that's not really relevant to blit performance since that mostly falls under the triangle count on the performance system.
From a performance standpoint, a majority of the practical applications of blit scripts would be much less performance heavy than GPU particles. Many applications just need blit to record data on stuff like recording finger positions for custom markers, or having a couple entities follow your player around. These kinds of things would take on the order of microseconds to execute on the GPU.
If you use the command buffer update mode, it'd probably be a good idea to have the blit controllers register with a manager that dispatches all of their updates to a single command list since there is a tiny overhead from executing the command list as noted in the performance section. If you do this, make sure to provide a public function that allows execution by worlds at specific times with a command list maintained by the blit controller. A specific use case of this would be executing a blit script directly after a camera in the world has rendered a frame via the OnPostRender event on an Udon behavior so that updates are not behind by a frame. Implementing this kind of thing is pretty dependent on how VRC has things setup and it should only take a couple of hours to do, so I don't have an example implementation of it.
todo