Recently I got interested in Blender 3D, partly inspired by infinigen project.
One day I encountered Tellusim. Impressed by the quality of its rendering, I was browsing its blogs and see this. Wow, Tellusim really blowed others out of the water;others including Unreal, Unity, Omniverse and Blender. Wait, Blender is really that slow importing a USD scene?
Since Blender is open-source, why not try to figure out what's going on? Here we go.
I cloned the latest version of Blender 4.0.0.Alpha and installed all dependencies. Building is a straightforward thing. I am working on Windows 10 and miss perf
on Linux. But VS2022 community edition does include a profiling tool: Microsoft (R) VS Standard Collector
. Go to the directory where blender.exe is located and issue:
VSDiagnostics start 1 /launch:blender.exe /loadConfig:"C:\Program Files\Microsoft Visual Studio\2022\Community\Team Tools\DiagnosticsHub\Collector\AgentConfigs\CpuUsageBase.json"
Now in blender
, import limits_32.usdc
unzipped from this test file. It's going to take a while and on my machine that is about 30 seconds. Once importing is done, run
VSDiagnostics.exe stop 1 /output:"usd_import1"
File usd_import1.diagsession
will have profiling information which you can examine in VS by opening it.
This is what it looks like on my system. If you want, a flame graph is also available.
Now, it should be clear that most of the time (28s-53s on the time line) of importing that particular USD file is spent in id_sort_by_name
. But why? What exactly does id_sort_by_name
do? It took me a while to figure out with help from the comments in the code. Blender creates many objects (its type is ID
) when importing scenes or users creating primitives. Internally, Blender maintains a doubly linked list of these ID's. The list is sorted by ID's unique name alphebatically. Everytime a new object (ID) is created, id_sort_by_name
is called to inserted the newly created ID into the list. Because the list is sorted and Blender assumes all objects created later have higher extension number (i.e., alphebatically bigger), the search is done backwards starting from the end. Fine, this sounds reasonable as each insertion most likely happens at the list end which is O(1) and the whole process is O(N) with N being number of new IDs. But how does profiling tell a different story?
All the usd files having performance issue have something in common: they have one entity shared by many objects. Blender's USD import utility creates a new object for each reference. Because they all have the same name, another process came in to assign a number to objects which have names already registered to make them unique. The problem is the assignment is based on a counter starting from 1. It ends up like this: an ID with name basename.12313
is alphabetically smaller than basename.9999
although the latter comes later than the former. So the linked list search needs to take a lot more steps to find a place to insert basename.12313
. Effectivelly, an O(N) process becomes O(N^2). We all know how bad O(N^2) is, right?
Up to now, the solution should also be clear: change the naming methods. Instead of changing the general name mangling function Blender uses, I made some modification directly to the USD import module: I count the total number of objects in a USD file and get the width of total digits, then for each unique basename, assign a new name with the format basename_00XXX
where XXX is their index and use 0 padding so they all have the same number of digits. This way ID_00999
will be near the head of the list, not the tail.
The above solution turns out to be not ideal. It couldn't handle importing the same USD file over and over or multiple ones all with the same basename(back to O(N^2) very quickly). Given that the process of assigning extension number uses a counter for each basename, I came out another solution. In id_sort_by_name
, instead of comparing the whole name (basename + number), I split it into two parts: first compare basename; and if basename is the same, compare number. This fix gave similar performance as before (see next), while allowing importing arbitrary number of files.
After the above ad-hoc fix, loading limits_32.usdc
took 8 seconds and a even bigger file limits_48.usdc
took 27 seconds while before fix it spent a whole 4 min 40 sec. Not bad.
Now id_sort_by_name
is not the bottleneck anymore. Other processes stand out, e.g. USDMeshReader::read_object_data
. For bigger files, building the DAG becomes more time consuming. There may be room to further cut down some time (in another post maybe) such that the whole Blender USD import pipeline matches Omniverse 2023.1
. However, I doubt Blender is ever going to reach the level close to Tellusim
.
Thanks for reading.
Ah yeah, the "once other bottlenecks are solved, something new becomes the bottleneck" -- I ran into this when optimizing USD importing into Blender using Disney Moana test scene (https://aras-p.info/blog/2022/07/20/Swallowing-the-elephant-into-Blender/).
Various things got addressed there, but the "IDs are sorted by name" remained. Initially I had another hack/workaround, i.e. to provide ability to disable sorting by name, create a bunch of new IDs (as an importer would do), and then in the end re-sort everything again. This was (correctly) deemed too brittle during code review (https://archive.blender.org/developer/differential/0014/0014162/#403619) and a possible future avenue was left to write functions to "create many IDs at once". An importer like USD could then potentially use that to create thousands/millions of IDs in one go, and internally that would sort using some efficient method. Actually doing that was left as a future exercise back then :)
The idea of changing ID_sort_by_name to operate differently on base name + suffix seems clever! Would love a pull request to actual Blender and/or reviewing it!