Skip to content

Instantly share code, notes, and snippets.

@ExpHP
Last active July 6, 2022 11:21
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ExpHP/f275e0edc02603580f24a5ba3da952cc to your computer and use it in GitHub Desktop.
Save ExpHP/f275e0edc02603580f24a5ba3da952cc to your computer and use it in GitHub Desktop.
AnmManager and TH17 1.00b

The ANM manager

TH16 can crash when a large number of enemies and bullets/lasers are destroyed, ruining some score runs. Notably, TH17 1.00a can also crash, but the bug was finally fixed in 1.00b.

Motivated by wanting to understand this crash and perhaps repair it for TH16, I made my best effort to look into the changes between TH17 1.00a and 1.00b. I have not yet figured out the true cause of the crash, but my efforts to discover the cause have forced me to understand even more about how ANMs work.
(Edit 2020/10/07: The reason for the crash is now known!)

Thanks to this, I have now mapped out a significant portion of AnmManager, the single largest struct ever allocated by the game.

The AnmManager structure

We'll start with TH16, as this is the code I'm most familiar with.

In TH16

// TH 16
struct AnmManager __packed
{
    /* 0x0000000 */ char __UNKNOWN[216];

    // During AnmManager's on_tick functions, this counts the number of ANMs that have been
    // encountered so far.  I don't think it's used.  I only mention it because it's touched
    // by code that's otherwise pretty important.
    /* 0x00000d8 */ int32_t useless_count;

    // These are used by AnmManager's on_tick handlers.
    /* 0x00000dc */ struct ZunListNode<AnmVm>* world_list_head;
    /* 0x00000e0 */ struct ZunListNode<AnmVm>* world_list_tail;
    /* 0x00000e4 */ struct ZunListNode<AnmVm>* ui_list_head;
    /* 0x00000e8 */ struct ZunListNode<AnmVm>* ui_list_tail;

    // This is discussed soon.
    /* 0x00000ec */ struct AnmFastVm fast_array[8191]; // 8191 = 0x1fff

    // Leftovers from LoLK.
    // When PointDevice hits a checkpoint, all anm vms meaningful to gameplay are replicated into
    // a backup set of vms, so they can be reloaded and/or saved to an autosave file.
    /* 0x0c27ad8 */ int32_t __lolk_next_snapshot_fast_id;
    /* 0x0c27adc */ char __UNKNOWN[4];
    /* 0x0c27ae0 */ AnmVmList __lolk_vm_snapshot_list_head;
    /* 0x0c27af0 */ struct AnmFastVm __lolk_fast_snapshot_array[8191];

    /* 0x184f4dc */ struct ZunListNode<AnmFastVm> freelist_head;
    /* 0x184f4ec */ int32_t __UNKNOWN_OR_UNUSED;

    // An array of pointers to structures representing parsed .anm files.
    // Everything has a special place in here; e.g. front.anm goes in index 05,
    // and the current player character's anm goes in index 09.
    /* 0x184f4f0 */ struct AnmLoaded* preloaded[31]; // parsed anm files

    // Most of this stuff seems to only be used in low level rendering code.
    // I doubt that any of it has changed meaningfully from TH16 to TH17.
    /* 0x184f56c */ struct D3DMatrix __matrix_184f56c__used_during_rendering;
    /* 0x184f5ac */ struct AnmVm __vm_184f5ac; // unknown purpose

    // Some fields that don't appear to have changed between TH16 and TH17.
    // They're used during rendering and mostly consist of two large D3D vertex buffers.
    // (actually, I have mapped this out, at least for TH17; see addendum)
    /* 0x184fba8 */ char __UNKNOWN[0x420088];

    // These are used by AnmManager's on_draw callbacks prior to TH17 1.00b.
    // Each one is only used for a single, 32-bit field: 'next_in_layer' at offset 0x5a8
    /* 0x1c6fc30 */ struct AnmVm layer_list_dummy_heads[43];

    /* 0x1c7fd84 */ int last_id_discriminator; // 19 bit integer, discussed soon
    /* 0x1c7fd88 */ char __UNKNOWN[8];
};  /* 0x1c7fd90 total size */

// a very common type
template <typename T>
struct ZunListNode {
    T* entry;
    struct ZunListNode<T>* next;
    struct ZunListNode<T>* prev;
    // sometimes this is copied between list nodes.  All I can say is that
    // the ZunListNode type is used literally *everywhere* and I've never once
    // seen this field used for anything meaningful, so don't worry about it.
    struct ZunListNode<T>* __seldom_used;
}

Changes after TH16

In TH17 1.00a, AnmManager has a single difference from TH16, which is the addition of an AnmVm at offset 0xd8 (right before the on_tick lists). Its purpose is unknown, and it was all zeros when I looked at it.

In TH17 1.00b, AnmManager is 0x106c4 bytes larger than in TH17 1.00a. The breakdown is as follows:

  • 0x100ac bytes come from the addition of a 4 byte field at offset AnmVm+0x5a8 to all ANM VMs. To my knowledge, this field is always zero. It may have been part of a failed attempt to fix the crash present in TH16 and TH17 1.00a.
  • The leftover AnmFastVm array from LoLK has been removed.
  • "Fast ids" (more on this soon) are now one bit larger, causing fast_array to grow to slightly more than double its size. Together with the above bullet, this leads to a net addition of a single AnmFastVm, which accounts for the remaining 0x618 bytes.
// TH 17 1.00b
struct AnmManager __packed
{
    /* 00000000 */ char __UNKNOWN[216];

    // This is even more useless now.  With the changes in 1.00b, it is incremented on draw,
    // and will ultimately count the product of (num_layers * num_vms).  THE MORE YOU KNOW
    /* 000000d8 */ int32_t useless_counter;
    /* 000000dc */ struct AnmVm __anm_vm_dc; // present in 1.00a, unknown purpose

    // Beginning in 1.00b, these are used by BOTH the on_tick and on_draw callbacks.
    /* 000006dc */ struct ZunListNode<AnmVm>* world_list_head;
    /* 000006e0 */ struct ZunListNode<AnmVm>* world_list_tail;
    /* 000006e4 */ struct ZunListNode<AnmVm>* ui_list_head;
    /* 000006e8 */ struct ZunListNode<AnmVm>* ui_list_tail;

    // Bigger now, as mentioned above.
    /* 000006ec */ struct AnmFastVm fast_array[16383]; // 16383 = 0x3fff
    
    // Most likely, the other leftover LoLK fields are still here; but it's impossible to tell,
    // since any code that would have used them was stripped out.
    /* 018600d4 */ char __UNKNOWN[24];

    // blah blah, unchanged
    /* 018601ec */ struct ZunList<AnmFastVm> freelist_head; // only used for its 'next' field
    /* 018608fc */ int __UNKNOWN_OR_UNUSED;
    /* 81860100 */ struct zAnmLoaded* preloaded[31];
    /* 0186017c */ struct D3DMatrix __matrix_186017c__used_during_rendering;
    /* 018601bc */ struct AnmVm __vm_18601bc;
    /* 018607bc */ char __UNKNOWN[0x420088];
    
    // these are unused now
    /* 01c80844 */ struct AnmVm layer_list_dummy_heads[43];
    
    /* 01c90a44 */ int32_t last_id_discriminator; // now 18 bits
    /* 01c90a48 */ char __UNKNOWN[8];
};  /* 01c90a50 total size */

VM preallocation, IDs, and the "fast path"

(the following section applies equally to TH16 and TH17)

On AnmManager, there is a giant array of 8191 preallocated VMs (16383 in TH17 1.00b) that I have named fast_array. These arrays hold structures that are just slightly larger than a VM:

struct AnmFastVm {
    /* TH16   TH17  */
    /* 0x000  0x000 */ struct AnmVm vm;
    /* 0x5fc  0x600 */ struct ZunListNode<AnmFastVm> freelist_node;
    /* 0x600  0x610 */ int32_t is_alive;
    /* 0x610  0x614 */ int32_t fast_id;  // index of this AnmFastVm in fast_array
}   /* 0x614  0x618 total size*/

Now, this is not just an optimization to avoid some mallocs and frees! Crucially, this preallocated array exists to optimize lookups of ANM VMs from their IDs; many game entities store ANM IDs instead of pointers to prevent usage after free when an Anm VM runs to completion and deletes itself, and this array allows these checks to be performed with O(1) memory lookups for the vast majority of gameplay.

Here's how it works:

  • ANM IDs are split into two parts: the fast id and the discriminator.
    • The lower 13 bits (14 bits in TH17 1.00b) are the fast id, copied from AnmFastVm.fast_id. As long as this value is in the range 0 <= fast_id <= 0x1ffe (0x3ffe in TH17 1.00b), the game knows exactly where the VM is (in fast_array), and can further know for a fact that the memory has not been freed. (so it is safe to double-check its id field)
    • The next 19 bits (18 bits in TH17 1.00b) are the discriminator. This is a nonzero integer that increases (with wraparound) every time a VM is created, and its purpose is to guarantee that the ID field of a preallocated VM will change if it is destroyed and then reused by something else. This is not just here for doomsday scenarios; after all, the (cache-efficient) FIFO nature of the freelist means that the most recently deleted VMs are the ones most likely to be reused by something else!

The slow path

Normally, the freelist provides instant access to the next unused AnmFastId, making "allocation" fast as well. But, if somehow all 0x1fff (or 0x3fff) of these VMs manage to get used and the game needs to create even more, they will use the slow path: When the freelist is empty, new VMs will be individually malloced on the spot (as AnmVms, not AnmFastVms) and given a fast_id of 0x1fff (0x3fff in TH17 1.00b). When the game needs to look up such a VM from its ID, it will search all living VMs by scanning the entire world_list and ui_list.

The LoLK fields

TH17 1.00b's change to AnmManager's on_draw functions

There is another change in TH17 1.00b.

  • As detailed here, AnmManager has two on_tick callbacks (one for world_list and one for ui_list) and a ton of on_draw callbacks (one for every layer).
  • Prior to TH17 v1.00b, each on_tick function was responsible for repopulating some of the lists stored for each layer at AnmManager.layer_list_dummy_heads. Then, the on_draw callbacks would each simply iterate over one of those lists.
  • As of TH17 v1.00b, those lists are no longer populated or used. Every single on_draw callback now iterates over both the on_world and on_tick lists, rendering all of the VMs that have the desired layer.
    • Note it performs the same layer mappings as mentioned in the above linked document when checking the layer of each VM (though this time it doesn't update the layer field in-place). This means that, for instance, if a VM in the world_list has layer 36 (which is normally a UI layer), it will be rendered by the on_draw function for layer 24, and not by the one for layer 36.
    • (In other words, everything is still rendered at the same priority as before the patch; when the on_draw func for a world layer iterates over the ui list, simply none of the VMs will be found to have a matching layer. The on_draw funcs just search both lists out of laziness basically)

Addendum 2020/10/07: Reason for the crashes

Working together with priw8 and sp0thss, I am now fairly confident that the cause for this crash is understood: As described in my other gist, AnmManager's on_tick functions build the layer lists for drawing. More specifically, each VM is ticked, and one of two things happens based on the return value of the ANM parser:

  • If the parser returns 0, the VM is prepended to its layer list.
  • Otherwise, the VM and all of its children are destroyed (destructor is called, memory is potentially freed).

So basically, the scenario is as follows:

  1. A VM that is a child runs during on_tick and gets added to its layer list.
  2. Later in this same function on the same frame, the parent runs and is deleted, which also deletes the child.
  3. During on_draw, the (now potentially deallocated) child is still in the layer list, and game segfaults due to a use-after-free.

Specifically, the problematic scripts appear to be not the bullet cancel script as you might imagine, but rather, the enemy death scripts! Many of these death scripts have children that outlive parents, and if you modify them so that the parent lives longer, then the crashes stop. (there may be other problematic scripts too that we don't know about)

Why does it only show up when there are many, many bullets being cancelled? I believe this is because it heavily stresses the slow path. Basically, even though fast-path VMs would have their destructors called in step 2, accessing them doesn't produce a memory access exception because, as far as the system is concerned, they're still allocated. And even once we hit the fast path limit, there may additionally be some effects such as some slow path VMs sharing memory pages with other allocations, allowing the memory to still be used after the VM is freed.

Why do the changes in 1.00b fix the crash? Basically, the layer lists are singly-linked, while the world/ui lists are doubly-linked. The destructor for VMs removes a VM from the world/ui list, but not from the layer list (because you can't remove from a singly-linked list!). Therefore, in step 3 of the above scenario, v1.00b does not encounter the deallocated child VM because it's using the world/ui lists.

It seems to me that what ZUN should have done to fix the issue is to simply change the layer lists to be doubly-linked so that they can be removed. I believe the current solution in 1.00b has some pretty bad implications for performance, but there's not much we can do about it now.

Addendum 10/11/2020 - Another bug!

There's an additional bug (at least in TH15)! If you quit the game with many sprites on the screen, then when vms are being all destroyed, if a parent is destroyed before the first child in its child list, then the game will crash when it destroys the child, as the child tries to write to a linked list node on the parent when removing itself from its list of siblings.

The bug here would appear to be that ZUN forgot to have parent vms remove themselves from these lists. And once again, the crash doesn't normally happen because there usually isn't enough vms for memory deallocation to occur.

I imagine this bug still exists in TH17 v1.00b, and will be easy to run into once the bullet_cap patch supports this game.

Small addendum: that 0x420088 size block

I actually did map out most of that one superlarge chunk of unknown data in TH17. It's just that (a) I don't want to have to double check everything in TH16 just to make sure it's identical, (b) binja is mean and doesn't let you copy struct field offsets so last time I had to use OCR just to make that column of field offset comments and hrrrrnaggaghagasdf why, and (c) it's not really important to the main point of this document.

But, here's what it contains in TH17 v1.00b (which most likely is also identical to TH16).

// there's still some bizarre stuff here. Something that looks like a 4*5
// floating point matrix or something. It's weird.
char __still_unknown[112];

// Texture-based sprites are written into a giant array of vertex data in the form of a
// D3D triangle list (each sprite requiring 2 triangles, or 6 vertices total).
// Many sprites may be written at once before any are rendered, so a count is kept of
// how many should be drawn on the next render call.
int32_t d3d_unrendered_sprite_count;
struct zRenderSpriteVertex d3d_sprite_vertex_data[0x20000];
struct zRenderSpriteVertex* d3d_sprite_write_cursor;
struct zRenderSpriteVertex* d3d_sprite_render_cursor;

// A matching set of four fields exist for drawing primitives (e.g. anm 600 series).
// However, primitives are always drawn immediately; hence, the render cursor and
// unrendered count are unused.
int32_t d3d_unrendered_primitive_count__always_0;
struct RenderPrimitiveVertex d3d_primitive_vertex_data[0x8000];
struct RenderPrimitiveVertex* d3d_primitive_write_cursor;
struct RenderPrimitiveVertex* d3d_primitive_render_cursor__unused;

where the two vertex types are

// Format of D3DFVF_XYZRHW | D3DFVF_TEX1 | D3DFVF_DIFFUSE
struct RenderSpriteVertex __packed {
    struct Float4 transformed_pos;
    DWORD diffuse_color_argb;
    struct zFloat2 texture_uv;
}

// Format of D3DFVF_XYZRHW | D3DFVF_DIFFUSE
struct RenderPrimitiveVertex __packed {
    struct Float4 transformed_pos;
    DWORD diffuse_color_argb;
}

Changelog

2020-10-11

  • Added another manifestation of the crash bug! Noticed it in TH15, and I bet it still exists in TH17 v1.00b.

2020-10-07

  • Added explanation for the crashes. Hooray!

2020-06-11

  • Fixed mistakes in discriminat bitsize (I misread the code). All 32 bits of the anm ID are used, not just 28.
  • The purpose of unknown_array_2 is now known; it's a leftover from LoLK PointDevice. Details about these LoLK leftovers have been added.
  • Added a document with the other 0x420088 block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment