apple1417/unreal_introspection.md

## unreal_introspection.md

      
    Raw
  

              unreal_introspection.md
            
          
    Crash Course in Unreal Engine Introspection

Table of Contents


Crash Course in Unreal Engine Introspection

Table of Contents
Intro
Names

UE3
UE4
UE4.23 - FNamePool
Using GNames


Object Structure
Intepreting Properties

ObjectProperty
StructProperty
ArrayProperty
StrProperty


Intro

Unreal objects are highly introspective. Knowing only a few offsets on base classes, you can follow
this this introspection, and convert from raw property names to the actual offsets needed to
reconstruct a pointer.
This guide assumes the following:

You have a working dumper, which gives you object names and their addresses
You have a pointer/signature to GNames - any tool which can give you object names will have found
it, just need one which actually exposes it too
You have a pointer to some static base object such as GEngine or GWorld, which you can use as a
starting point to get to the fields you want - just find them in your dumps then pointer scan
The base fields you need to know have completely unknown offsets, which you need to reverse
engineer too. This isn't ever really the case, especially if you have the source code of your
dumper, but assuming it is will help for when one does move.

Names

The first concept you need to understand is names. Names are strings which are expected to hold
mostly constant data. All names are stored in a big strings table, GNames, and then name fields on
objects actually just hold an index into them, allowing for very efficent compares. You may see them
referred to as FNames in unreal documentation, or name fields in unrealscript.
struct FName {
    int32_t index;
    int32_t number
};
Given it's simplicity, this struct is very unlikely to change between ue versions.
If you autoguess offset types in cheat engine, names will generally appear as about a 5-digit hex
value (index), followed by a decimal value which is typically (but not always) zero (number).
So how do you convert names back into their actual string? This is what you need that GNames pointer
for. The structure of this has changed several times.
UE3

template <typename T>
struct TArray {
    T* data;
    int32_t size;
    int32_t max;
};

struct FNameEntry {
    bool is_wide : 1;
    int32_t index : 31;

    union {
        char ansi[];
        wchar_t wide[];
    };
};

TArray<FNameEntry*> GNames;
For clarity, in this inital is_wide/index bitfield, if you read it as a single int32, is wide is
always bit 0, and index is always bits 1-31. The actual unreal code handles it a little differently
to ensure this is always the case, regardless of compiler. In practice, you can get away with
assuming all strings are wide, so you can mostly ignore this.
You should notice all the FNameEntrys are allocated in a single block, all the pointers should be
quite close to each other. All the strings are of their minimal size (with a null terminator), so
each entry starts immediately after the last (allowing for 4-byte alignment). You will probably find
there's some extra padding in the actual structs, but it should be quite easy to pick out where the
strings actually being.
To access an entry, simply index the tarray - *GNames.data[idx].
UE4

template <typename ElementType, int32 MaxTotalElements, int32 ElementsPerChunk>
struct TStaticIndirectArrayThreadSafeRead {
    ElementType** objects[(MaxTotalElements + ElementsPerChunk - 1) / ElementsPerChunk];
    int32_t count;
    int32_t chunk_count;
};

struct FNameEntry {
    bool is_wide : 1;
    int32_t index : 31;
    FNameEntry* next;

    union {
        char ansi[];
        wchar_t wide[];
    };
};

TStaticIndirectArrayThreadSafeRead<FNameEntry, 0x400000, 0x4000> GNames;
This may look a little complex, but simplifying it down, objects is just a 0x100-element array, of
pointers to 0x4000 element arrays, of pointers to FNameEntrys. You can index through these using
*GNames.objects[idx / 0x4000][idx % 0x4000].
The FNameEntrys are basically the same as in the previous version.
UE4.23 - FNamePool

Note I have no direct experience with this version, so this section is a bit more shakey than the
others.
struct FName {
    int16_t name_offset;
    int16_t chunk_offset;
    int32_t number
};

struct FNamePool {
    uint8_t padding[0x10];
    uint16_t* data[];
};

struct name_metadata {
    uint16_t : 6;
    uint16_t size : 10;
};

FNamePool GNames;
So this version changed things up a lot - so much that structs don't really explain it that well.
The first big change is that the FName structs themselves don't really store a single int32 index
anymore. In fact, they do away with indexes all together, and just store offsets. The lowest 16bits
are the name offset, and the next 16bits are the chunk offset. Presumably precalculating these makes
lookups more efficent.
Now you'll notice I never actually defined FNameEntry for this version. It isn't really needed
anymore. Each chunk is made up entirely of tightly packed strings. Rather than using a null
terminator as a seperator, the strings have a leading 2-byte metadata value seperating them. You
could think of this as a struct of a uint16 and a (w)char array, but that may lead to misleading
assumptions about indexing.
To lookup an entry, you just follow the offsets - GNames.data[chunk_offset][name_offset]. Note
that unlike before, the two index operations are now looking through two different data types, the
chunk offset is indexing through 4/8-byte values, while the name offset is only indexing through
2-byte values. The address you arrive at is the metadata value for the following name. Most notably
for us, the uppermost 10 bits of the metadata contain the size of the name. Presumably one of the
other bits is is_wide, I don't know which. Read the metadata value, work out the size, then
increment the name offset and read that many more characters.
Using GNames

So with all this, you should be able to convert a name index into it's string. The first few entries
are hardcoded, if you want to test a few:

0 -> None
1 -> ByteProperty
2 -> IntProperty

In games using the FNamePool version, the indexes won't actually be 0-2, but those strings should
still be the first entries.
There are three other important things to know about GNames:

Names can be added at runtime meaning some later strings may change their index based on what
order things were loaded.
Once a name is in GNames, it's index will never change - it's fine to cache them (and I'd
recommended it if you're running in a seperate process).
If you're in a seperate process, there's no efficent way to get the index of a name from it's
string. If you're injected into the game process, you can find and call FName::Init.

There's still one more thing about names you may need to know about: the number. If you scroll
through the object dumps, you'll notice a lot of objects named something like PlayerController_12,
most common on very temporary objects. Since it'd be very inefficent to allocate a new name for
every temporary object, the number field is used to add an underscore and a numeric suffix. If
it's 0, there's no suffix, otherwise it's the number minus one.
name_string = GNames.get(name.index)
if name.number != 0:
    name_string += "_" + str(name.number - 1)
Typically, object fields will not use numeric suffixes, so you may be able to get away with ignoring
them.
Object Structure

Now let's go over the data structures that let us actually do the introspection. It's easiest to
start by just looking at the structs.
class UObject {
    FName name;
    UObject* outer;
    UClass* class_;
};

class UField : public UObject {
    UObject* next;
};

class UStruct : public UField {
    UStruct* super_field;
    UField* children;
    UProperty* property_link;
};

class UClass : public UStruct {};

class UProperty : public UField {
    int32_t element_size;
    int32_t offset_internal;
    UProperty* property_link_next;
};
Note that these are only the fields you might be interested in, these structs are definitely very
unlike what you'll actually find laid out in memory.
Your unreal object dumper will show you all subclasses of UObject, so you can recover most of the
actual offsets just by autoguessing fields in cheat engine, looking for pointers, then seeing what
those addresses corrospond to in your dumps. More on what exactly you'd expect to see later.
As previously mentioned, FNames tend to show up as about a 5 digit hex value, generally followed
by decimal 0. UObject.name will appear quite close to the start of the object, typically right
next to one of the other listed pointers. You can then lookup the name in GNames and compare against
what your dumper tells you to confirm that you've found the right offset.
The fields on UProperty are a little trickier. element_size typically appears near the start of
the property, near the property flags, which should appear as a hex int32 bitmap. You can try lookup
properties of known sizes to confirm - an IntProperty will always be 4 bytes and a NameProperty
will always be 8. offset_internal typically shows up a little later in the object, nearer to
property_link_next, and will typically be about a 3 digit hex value. If you've already pointer
scanned an offset, use that to confirm you're reading the right thing, otherwise confirming it is
mostly just a matter of trying it and seeing if it makes sense.
Now once you've worked out all the fields, there are a number of useful linked lists you can follow:


UObject.outer.outer - Outer objects. These don't actually affect anything, but they're used to
get the full object "path name", which is handy for debugging. The exact path logic is as follows:
def get_path_name(obj: UObject) -> str:
    if obj.Outer is None:
        return obj.Name
    seperator = "."
    if obj.Outer.Class.Name != "Package" and obj.Outer.Outer.Class.Name == "Package":
        seperator = ":"
    return get_path_name(obj.Outer) + seperator + obj.Name
The seperator logic of course isn't particularly important, so you can choose to ignore it.


UStruct.super_field.super_field - Chain from most to least derived struct. Most common for
classes, ending in the UObject class, but sometimes structs have inheritance too (e.g. Plane2D
inherits Vector2D).


UStruct.property_link.property_link_next.property_link_next - All properties on a struct,
including on the less derived structs. Properties are the actual values stored in memory after the
start of the struct, these are what you want to get the offsets of.


UStruct.children.next.next - All fields on this struct in particular. Fields include
properties, but also other things such as inner structs, functions, enums, and constants. This
chain does not include the less derived structs, so you probably need to combine it with the
super field one.


So to iterate through properties you have two choices - property_link, or super_field +
children. I would recommend the former, as it tends to be quicker, but you may find the latter
easier to find (especially since the offsets are smaller) or to work with.
Now finally, how do you find the specific property you're interested in? Just compare against the
name. The name field on the property object is the name of the property. So to bring it all
together, let's say you have a pointer to GEngine, and you want to find the GameInstance field:
prop = get_gengine().class_.property_link
while prop:
    if prop.Name == "GameInstance":
        break
    prop = prop.property_link_next
Of course you'll need to add some error checking/retrying on failure.
Intepreting Properties

So you can get from an object to the property objects you're interested in. What next?
As you may have already guessed, UProperty.offset_internal is the offset from the start of the
object to that property's data. For a lot of property types, the data stored here is pretty self
evident - a DoubleProperty stores a double at that offset, a NameProperty stores an FName,
etc. There are four main special cases you probably want to know.
ObjectProperty

Object properties might seem like another one of those simple types - the value at the offset is a
pointer to another object. However, object properties have an extra useful field.
class UObjectProperty : public UProperty {
    UClass* property_class;
};
This field, unsuprisingly, holds the class which this property accepts. In practice, it lets you
jump directly to the next class, to start working out offsets on it, without having to wait for any
objects to be constructed.
There is one caveat to this - it only holds the least derived class the property accepts.
Sometimes you want to get one of the more derived classes instead. One common example of this is
localplayer.PlayerController - this field holds an instance of a PlayerController, but you
probably want to read fields off of a game specific subclass of PlayerController instead.
You have two real options here:

Find another pointer path which is restricted to the more derived class. This is preferred, but
not always possible.
Wait for the property to get populated with an instance of the more derived class, then read the
class off of it.

You could also try iterate through GObjects, the global array of all unreal objects, looking for the
specific class object, but this can easily involve tens of millions of string comparisions. If
you're injected into the game process, you could try find and call StaticFindObject to optimize a
little, but it's still probably worse than one of the other two options.
StructProperty

Struct properties consist of a blob of data holding the struct contents. How do you parse it? You
have to start by reading an extra field off of the property again.
class UScriptStruct : public UStruct {};

class UStructProperty : public UProperty {
    UScriptStruct* struct_;
};
After reading the struct off of the property, you can then parse through it's inner properties in
the exact same way to get to it's offsets. Note that these offsets are relative to the start of the
struct - the same struct can be used in a number of different places. You'll have to add the offset
of the struct to the offset of it's inner properties.
ArrayProperty

Array properties bring back a template we briefly mentioned during parsing GNames.
template <typename T>
struct TArray {
    T* data;
    int32_t size;
    int32_t max;
};
Given it's simplicity, this struct is very unlikely to change between ue versions.
Note that you only need to care about max if you're writing to the array. You can otherwise ignore
it if you're only reading from it, but it's handy to use to find these arrays while browsing memory,
since it will always be >= size, and will generally be a power of two.
The offset you read off of the array property will point to the start of a TArray. But we still
need to know the templated type, how do we know how where to look for each element? We need to read
another value off the property again.
class UArrayProperty : public UProperty {
    UProperty* inner;
};
This time, we're not interested in the offset on the property (it should be 0), instead we want to
read the element_size field. The ith element of an array will be at offset i * element_size
within the data.
It's common for structs to be nested inside arrays (and vice versa). In this case, remember that
each struct is a seperate array element, a field on the struct is at offset
(i * element_size) + struct_offset within the data.
StrProperty

String properties hold an arbitrary, generally user provided string. They are essentially just a
specialization of an ArrayProperty (though don't actually inherit from it). They don't have any
special fields on their class you need to be aware of.
class UStrProperty : public UProperty {};

using FString = TArray<wchar_t>;
These strings should be null terminated, so you can get away with ignoring the size field;