Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save vtsingaras/74c33554c73294952b7f990bd9aaadec to your computer and use it in GitHub Desktop.
Save vtsingaras/74c33554c73294952b7f990bd9aaadec to your computer and use it in GitHub Desktop.
Simple Harmony Transpiler Tutorial

About
A hopefully easy tutorial on how to make a Transpiler with Harmony Based on patching a method in the game RimWorld

1) ILSpy

Get yourself the ILSpy that Zhentar has modified so it generates better code: https://github.com/Zhentar/ILSpy/releases (mad props to Zhentar!)

2) Decompile

Start it and open the rimworld DLL. It is located in
C:\Program Files (x86)\Steam\steamapps\common\RimWorld\RimWorldWin_Data\Managed\Assembly-CSharp.dll

3) The original method

Now search for CheckForErrors and find the one from RimWorld.Dialog_FormCaravan, double click it. You should see the source in the window with yellow background.

4) Viewing IL Code

I guess you knew those steps already, so lets go into uncharted teritories now. Choose IL instead of C# in the dropdown menu at the top and you should see something like that:

.method private hidebysig 
	instance bool CheckForErrors (
		class [mscorlib]System.Collections.Generic.List`1<class Verse.Pawn> pawns
	) cil managed 
{
	// Method begins at RVA 0xbb290
	// Code size 617 (0x269)
	.maxstack 64
	.locals init (
		[0] class Verse.Pawn,
		[1] int32,
		[2] int32,
		[3] int32,
		[4] int32,
		[5] class RimWorld.Dialog_FormCaravan/'<CheckForErrors>c__AnonStorey3F6',
		[6] class RimWorld.Dialog_FormCaravan/'<CheckForErrors>c__AnonStorey3F8'
	)

	IL_0000: newobj instance void RimWorld.Dialog_FormCaravan/'<CheckForErrors>c__AnonStorey3F6'::.ctor()
	IL_0005: stloc.s 5
	...

Switch between IL and C# so you get a feeling of how the two roughly compare to each other. Most of the time, the structure is quite similar but sometimes the compiler moves things around, mainly the contents of IF statements where it does switch the logic around and places code at the end of the IL code and jumps to it.

5) IL Code Basics

Now this looks scary but it is actually a simple code. Beside the mumbo jumbo at the beginning, each row starting with IL_ is a one-part or two-parts code. The first part is the operation and the optional second part the value that operation works with. The value can be a lot of different things: int, Type, FieldInfo, MethodInfo, Label etc.

The cool thing is, that the whole IL code system works like a stack, basically a deck of cards where you only deal with the top by adding or removing a card. Whether you add numbers or call subroutines, almost everything happens on the stack. So lets ignore for now that you don't know jack of the different operations and just look at this section that contains the worlds reform, MassUsage and Message in that order (that's how I found it):

IL_0078: ldarg.0
IL_0079: ldfld bool RimWorld.Dialog_FormCaravan::reform
IL_007e: brtrue IL_00ac
IL_0083: ldarg.0
IL_0084: call instance float32 RimWorld.Dialog_FormCaravan::get_MassUsage()
IL_0089: ldarg.0
IL_008a: call instance float32 RimWorld.Dialog_FormCaravan::get_MassCapacity()
IL_008f: ble.un IL_00ac
IL_0094: ldarg.0
IL_0095: call instance void RimWorld.Dialog_FormCaravan::FlashMass()
IL_009a: ldstr "TooBigCaravanMassUsage"
IL_009f: call string Verse.Translator::Translate(string)
IL_00a4: ldc.i4.2
IL_00a5: call void Verse.Messages::Message(string, valuetype Verse.MessageSound)
IL_00aa: ldc.i4.0
IL_00ab: ret

That corresponds to your C# code:

if (!this.reform && this.MassUsage > this.MassCapacity)
{
	this.FlashMass();
	Messages.Message("TooBigCaravanMassUsage".Translate(), MessageSound.RejectInput);
	return false;
}

Let's analyze:

  • this.reform - we need the field reform from this. So we load this onto the stack (topmost now is this). Then we load the field reform onto it: The load field operation will first take the topmost element from the stack (the this we just put on it) leaving it empty, then do it's operation and put the result back onto the stack. So now, the stacks topmost element is the field reform.

  • the ! on this.reform - the compiler thinks that testing the field for true and jumping over the code (to a much later line IL_00ac) is the way to go here. Bailing out if this.reform is true. The thing to easy overlook is again, that the comparison must consume (remove) the topmost value from the stack to compare it to true so now the stack is empty again!

  • now for this.MassUsage > this.MassCapacity - this is again the same drill:

    • this onto the stack
    • MassUsage call consumes one element and leaves result on stack
    • this onto the stack again
    • MassCapacity call consumes one element and leaves result on stack
      Which leaves us at this point with two elements on the stack so we can call the compare which will consume both elements and jump. The compiler again switched the logic and tests for <= which is ble (branch when less equal) to the same bail out line as before (IL_00ac).

Since we now have completed the IF statement, it is time to do some work. But again, this is just stack operations: putting stuff onto the stack and consuming it which may result into another thing put onto it again.

At this point, I would ask two things: Where can I possibly know all the IL code operations and how they work? And: how does method calling work exactly? Which leads us to:

6) IL Code reference

Overview on Wikipedia:
https://en.wikipedia.org/wiki/List_of_CIL_instructions](https://en.wikipedia.org/wiki/List_of_CIL_instructions)

More in detail on Microsoft page about the class OpCodes (used in transpilers). I use this all the time since it has links to each code and is kind of official and handy at the same time:
https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes(v=vs.110).aspx

Pretty good tutorial on CodeProject:
https://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language

And for all the nitty gritty details (only as a last resort):
http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf

7) Method calling

I would start with pointing out the difference between static and non-static methods. Since all you have is the stack and your opcodes, there is no big difference. Remember how to do extension methods where you put "this" as the first argument to a method when you define it? It's a hint that instance methods have an invisible first parameter that represents this.

Back to the beginning of our code

IL_0078: ldarg.0
IL_0079: ldfld bool RimWorld.Dialog_FormCaravan::reform
IL_007e: brtrue IL_00ac

We use LDARG 0 to load the argument number 0 (zero based it is the first) onto the stack. That is this. Which is of type Dialog_FormCaravan. So after that operation, the topmost element on our stack is the object instance of type Dialog_FormCaravan.

So in order to call an instance method, you have to add the object that this method is called on onto the stack (with ST... opcodes), then all the arguments that method takes. Then you call the method with the CALL opcode. That's what happened in our code when we called MassUsage

IL_0083: ldarg.0
IL_0084: call instance float32 RimWorld.Dialog_FormCaravan::get_MassUsage()

It is a property and you already guessed that their accessor methods start internally with "get_" or "set_". The getter of course takes no extra arguments but you still need to load the object instance onto the stack. In this case "this" but it could of course be anything. This allows for cool chaining of calls. A fake example:

ldfld Foo SomeStaticClass::theFoo
ldfld Bar Foo::theBar
call instance void Bar::Cool()

which would be this line in C#:

SomeStaticClass.theFoo.theBar.Cool()

Please note that the first ldfld does not need a stack element since it is a static field. The next ldfld just pops/pushes the stack and the last call will consume that element again to call Cool on it. This leaves the stack empty (or in the same state as before).

Which leads to me saying the single most important thing to remember with IL code (CIL as Microsoft calls it):

You mess with the stack you mess with the devil. So never shall you leave unnecessary elements on the stack as I will punish you with errors when you try to compile that shit. Same goes of course for underflows where you put an opcode somewhere where there isn't enough (or the wrong) elements on the stack. The interesting thing here is that since CIL is so simple, the compiler can do most of those checks during compile time!

Now, to make some progress, we switch to Harmony and how to deal with all this. Again, it looks complicated but in essence, it is simple data processing. Since all we do is to rewrite code, we have not access to runtime state. That leaves us with a simple black box: codes in, do something, codes out. That's what Harmony calls a Transpiler.

8) Harmony Transpiler

The basic patch looks like this. Let's take our example:

[HarmonyPatch(typeof(Dialog_FormCaravan))]
[HarmonyPatch("CheckForErrors")]
public static class Dialog_FormCaravan_CheckForErrors_Patch
{
	static IEnumerable<CodeInstruction> Transpiler(IEnumerable<CodeInstruction> instructions)
	{
		// do something
	}
}

Some quick remarks: The patch annotations specify the target method. Read about defining patches here: https://github.com/pardeike/Harmony/wiki/Patching

Also, the name of the method is Transpiler so Harmony knows it is a transpiler. The methods signature must be that it returns IEnumerable<CodeInstruction> instructions and the arguments must contain at least one that is of type IEnumerable<CodeInstruction> and you can also inject the code generator with ILGenerator generator and the original method info with MethodBase method.

So how does one define what the transpiler does? I strongly recommend reading up on how to use "yield" with methods returning IEnumerable but you can also go old fashion and convert anything to a list and at the end back to an IEnumerable. Here is a nice tutorial on yield: https://www.kenneth-truyers.net/2016/05/12/yield-return-in-c/

9) The patch

Luckily for us, your change is rather simple. We just need to remove that code or make it not execute. We could do this by "nulling it out" with a OpCodes.Nop (no operation) or we add a non-conditional jump somewhere. Or we could remove that section of the code all together. Let's have a look again at our code, this time with some extra rows before and after:

IL_0077: ret

IL_0078: ...codes...
IL_007e: brtrue IL_00ac
IL_0083: ...codes...
IL_008f: ble.un IL_00ac
IL_0094: ...codes...
IL_00ab: ret

IL_00ac: ...codes...

I just removed all the codes that do not change execution flow. Starting with 0077 which is the end of the code just above if (!this.reform && this.MassUsage > this.MassCapacity). As we saw before, it jumps twice to 00AC and finally ends with the return at 00AB.

This looks simple. We could replace the codes from 0078 - 00AB with NOPs or we could insert an extra jump right between 0077 and 0078.

Now, normally, I would do this in a way that minimizes the risk of it breaking in case RimWorld changes the code of this particular method. It usually involves finding some anchors that are unique in the code and referencing everything from there. I.e. if we would know the start of 0078 in the above example, I would look for the first jump and use the 00AC I find to determine the end. But the main problem with this part of the code is that it isn't that unique. One could go overkill and match each and every opcode to find the correct sequence but that would break with code changes anyway, so I take the easy way out and just check if we can find the RET at position 0077 then insert an unconditional jump to 00AC.

Here, we have a few choices again. The simplest way would be to count the opcodes from the start of the method and just when the count indicates we are at 0077 (those are not the number of opcodes but the actual byte count that operation starts). A better approach is to find a nice pattern and in fact, we have it. If you look at just the RET codes, you will see that they divide the whole code into sections. We could look for a section containing a call to get_MassUsage() and then remove it from the code. Let's do that.

11) The execution

Strategy: Search for RET codes. For every code found, search until the next RET and look for the usage of the string "TooBigCaravanMassUsage". If found, continue to find the following RET and remove everything from right after the first RET to including the second RET:

[HarmonyPatch(typeof(Dialog_FormCaravan))]
[HarmonyPatch("CheckForErrors")]
public static class Dialog_FormCaravan_CheckForErrors_Patch
{
	static IEnumerable<CodeInstruction> Transpiler(IEnumerable<CodeInstruction> instructions)
	{
		var foundMassUsageMethod = false;
		int startIndex = -1, endIndex = -1;

		var codes = new List<CodeInstruction>(instructions);
		for (int i = 0; i < codes.Count; i++)
		{
			if (codes[i].opcode == OpCodes.Ret)
			{
				if (foundMassUsageMethod)
				{
					Log.Error("END " + i);

					endIndex = i; // include current 'ret'
					break;
				}
				else
				{
					Log.Error("START " + (i + 1));

					startIndex = i + 1; // exclude current 'ret'

					for (int j = startIndex; j < codes.Count; j++)
					{
						if (codes[j].opcode == OpCodes.Ret)
							break;
						var strOperand = codes[j].operand as String;
						if (strOperand == "TooBigCaravanMassUsage")
						{
							foundMassUsageMethod = true;
							break;
						}
					}
				}
			}
		}
		if (startIndex > -1 && endIndex > -1)
		{
			// we cannot remove the first code of our range since some jump actually jumps to
			// it, so we replace it with a no-op instead of fixing that jump (easier).
			codes[startIndex].opcode = OpCodes.Nop;
			codes.RemoveRange(startIndex + 1, endIndex - startIndex - 1);
		}

		return codes.AsEnumerable();
	}
}

There it is. Add this to your code and use the normal Harmony bootstrapping as described in the Harmony Wiki and you have just done your first Harmony Transpiler!

Enjoy
Andreas Pardeike
https://github.com/pardeike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment