StanHash/summary_of_changes.md

## summary_of_changes.md

      
    Raw
  

              summary_of_changes.md
            
          
    ColorzCore 2024.05 Summary of changes

This is a big update to ColorzCore, introducing a handful of new features, fixing a few bugs, and changes a few fundamental things (hopefully for the better).
This is a summary of the most important features and changes. Many smaller features, changes and details were omitted to make this digestible, but you can read the full list of changes in the pull-request description here: FireEmblemUniverse/ColorzCore#63.
Offsets were changed to addresses

One of the most important changes in this new version of ColorzCore is the migration from using ROM offsets to memory addresses. This is the biggest source of breaking changes, so in order to know best how to fix such issues be sure to try your best to understand this at least.
An address, sometimes also called a pointer (though I would argue a pointer is a related but different thing), is the number that identifies the location of data when the binary is loaded into memory. On the GBA, the ROM is loaded at memory address 0x08000000, so any ROM offset "foo" will correspond to address "0x08000000 + foo".
Here's the specific list of things that changed regarding offset vs. addresses:

The identifier CURRENTOFFSET will now expand to an address rather than a ROM offset.
Labels are defined as addresses rather than ROM offsets.
ORG now accepts addresses as well as ROM offsets. It will silently convert offsets to addresses if those offsets are within the 0 to 0x1FFFFFF range.
POIN (and any other raw that have pointer fields) now only converts offsets to addresses if they are within the 1 to 0x1FFFFFF range. (Offset 0 was already never converted to allow defining null pointers, but with this change you can now refer to ROM offset 0 by writing its address 0x08000000 instead).

This last change to POIN is particularily interesting as it now allows one to give not only just ROM addresses, but any RAM address as well. Previously this would have needed the use of WORD (which was not always possible, as was the case with events produced by tools such as lyn).
Symbol assignment

Symbol assignment finally made its way into ColorzCore!
MySymbol := 776

WORD MySymbol

Symbols are a generalization of labels, and thus have many of the same properties:

They are visible to raws even before they are assigned.
They respect {} local scopes (unlike macros).
They are output in the .sym file if the --nocash-sym switch is given.

In addition, a symbol does not need its value to be evaluatable at assignment time. For instance, the following script is perfectly sound:
    MySymbol := MyLabel + 14

MyLabel:
    FILL 776 0xFF

    ORG MySymbol
    BYTE 0

Label definitions can now be seen as syntactic sugar for the following symbol assignemt:
MyLabel := CURRENTOFFSET

Like labels, symbols can only be integers (32bit two's complement signed integers, to be precise).
New operators

A slew of new mathematical operators were introduced, most of which are compare or conditional operators:

A < B, A <= B, A == B, A != B, A >= B, A > B operators for comparing operands with each other. They evaluate to either 0 or 1 depending on result.
A && B, A || B, !A operators, for boolean logic.

The && and || operators will evaluate to the "last meaningful value". That is: A && B is B if A is non-zero, otherwise zero; and A || B is A if it is non-zero, otherwise B.


The bitflip operator ~A, which was long overdue. This operator flips all bits in its operand.
The undefined-coalescing operator A ?? B. This operator evaluates to A if A does not involve any undefined symbol or label, otherwise it evaluates to B. This is the only operator that accepts an expression that involves undefined symbols or labels as one of its operands.

The ASSERT statement was adjusted to behave differently when it is given a compare or conditional expression: the assertion fails if the expression evaluates to 0. In all other cases, it still behaves the old way (failure if negative).
Thanks to the ?? operator, one can define a "IsLabelDefined" macro like so:
#define IsLabelDefined(label_name) "(((label_name) || 1) ?? 0)"

The generic #if conditional directive

Previously, the only conditional interpretation constructs available were #ifdef and #ifndef.
Now, ColorzCore features the generic #if directive. It is one of the two primary places where the new compare and conditional operators are meant to be used (the other being ASSERT).
The use of #if is quite simple:
#if MAX_ITEMS != 0
// ...
#else
// ...
#endif

Relaxed #define syntax

Before this update, if one wanted to encapsulate complex statements in a macro, they would need to enclose the replacement sequence in a string.
This is no longer necessary. The replacement part of the #define directive can now be any sequence of tokens and will be treated as such. A single string token is now the exception, which will see the contents of the string parsed as tokens.
Examples:
#define MyCustomData(param_a, param_b) ALIGN 2 ; SHORT 0xFE (param_a) (param_b)

By the way, you can escape any newline character using \, so you can write this as well:
#define MyCustomData(param_a, param_b) \
    ALIGN 2 ; \
    SHORT 0xFE (param_a) (param_b)

This is not actually new (this was introduced by Sme about a year ago), but I figured it was worth mentionning.
Non-productive definitions

Speaking of #define, one frustrating thing before this version of ColorzCore is that you couldn't have both an identifier visible to #ifdef AND have it be a valid identifier for a label (or now, a symbol):
    #define ItemTable ???
    // ...
ItemTable: // Will probably break!!!!
    // ...
    // one would want this to not be satisfied if ItemTable was defined as a label
    #ifndef ItemTable
    #define ItemTable <Whatever Default Value>
    #endif

A workaround was to define that definition to something else that was a valid identifier for a symbol:
#define ItemTable MyItemTable

But that lead to symbols not being what one would expect when exported in .SYM files.
Non-productive definitions allow one to both define a macro AND keep its name available for symbols without using such hacks, by having the definition expand into exactly itself:
#define ItemTable ItemTable // ItemTable is non-productive

Before, expanding this would have lead to an infinite loop. Now such definitions are simply not expanded.
Note that the above example could also be worked around using a IsLabelDefined macro defined in terms of ??.
Formatted interpolation within strings for messages and more

MESSAGE "Start of section at address {CURRENTOFFSET:X8}"

// ...

MESSAGE "End of section at address {CURRENTOFFSET:X8}"

This was initially meant for use with the MESSAGE, WARNING and ERROR statements.
Note that now that the __FILE__ and __LINE__ identifiers work, one can write error macro helpers!
#define MyCustomError ERROR "Something bad happened in file {__FILE__}, line {__LINE__}"

New ReadByteAt built-in macro and its friends

ReadByteAt(offset) gets the value of the byte at given offset (or address) in the existing binary ColorzCore is writing to. This only works in A mode, as in AA mode, there is no such binary.
ReadShortAt(offset) and ReadWordAt(offset) also exist as convenient extensions.
This can be used, for example, to detect if a specific part of the base ROM is what it is expected to be, or was tampered with by prior modification (using the new #if directive!).
These macros don't see anything that is being written by the current invocation of ColorzCore, only what was there before.
I don't recommend relying on these macros unless you specifically need to.
BASE64

This brings what MinN-11 already implemented in FireEmblemUniverse/ColorzCore#59.
You can now embed data defined by Base64 strings using the new BASE64 statement. Like so:
BASE64 "RXZlbnQgQXNzZW1ibGVy"

This example emits the ASCII-encoded string "Event Assembler".
STRING and custom TBL encodings

The String macro has evolved into the STRING statement!
STRING "My Super String"

The STRING statement also accepts strings with formatted interpolation:
    GIVEN_MONEY_VALUE := 5000

MyGiveMoneyMessage:
    STRING "You got {GIVEN_MONEY_VALUE} monies!"

By default, strings will be encoded in UTF-8. You can, however, define custom encodings by loading a .TBL file using the new #inctbl directive. For example:
#inctbl "FE3" "fe3.tbl"

...

STRING "Marth" "FE3"

The variant of TBL files that ColorzCore accepts is defined as follows:

Any line that doesn't contain non-whitespace characters is ignored.
Leading whitespace is ignored (but not tailing of course: one would probably like to define encodings for spaces).
Lines that begin with the chracter '*' defines the newline character's encoding. The rest of the line is the corresponding hex encoding.
Lines that begin with the character '/' are ignored. They are meant to define the end of string character, but STRING does not terminate strings (it just emits the characters).
Any other line is formatted as such: HEX=text, where HEX is the hex encoding of the text.
Encodings are "hex" strings, that is a sequence of character pairs that define the value of bytes in hexadecimal. For instance, the encoding "0102" defines the byte sequence { 1, 2 }. The length of the hex string defines the length of the corresponding byte sequence.

This should encompass most existing .TBL files.
The existing String(...) built-in macro is now defined in terms of the STRING statement.
Breaking changes

Some of the changes introduced end up breaking some existing scripts. Here are the things that are currently known to be broken.
Unguarded expression Macros

If one would define a macro like this:
#define MySum(a, b) a + b

And later use it like this:
WORD MySym(10, 20) * 5

Intuitively, you would expect this to be equivalent to WORD 150
After macro expansion, this would be equivalent to the following statement:
WORD 10 + 20 * 5

This is perhaps a bit unintuitive as it means that the body of the macro, which is a perfectly fine standalone expression, is actually "broken apart" because of precedence rules ('*' is "stronger" than '+'), resulting in this being equivalent to WORD 110. This is however a well known quirk of C-like macro text replacement and something developpers have worked around for decades (by guarding expressions within macros with parenthesis).
The reason why this is a breaking change is because in previous versions of ColorzCore, a quirk in the way #define worked meant that macros like these could sometimes, in certain fairly specific circumstances, behave the """intuitive""" way. This is no longer the case.
One known case of such a broken macro was found in the Skill System and is documented here: FireEmblemUniverse/SkillSystem_FE8#637.
A new warning was introduced to detect these kind of "broken" macro expansions within expressions.
Marked pointers

POIN XYZ will only convert to addresses if XYZ is between 1 and 0x1FFFFFF. This means that if you use some high bit to mark a pointer for some specific use, this can be problematic.
This is noteworthy as the setText macro commonly used to set the Anti-Huffman mark to the pointer is affected by this.
This is only probematic if you use an offset that isn't derived from a label or CURRENTOFFSET. For example:
POIN 0xABCD | 0x80000000 // Anti-huffman marked pointer to ROM offset 0xABCD.

The above expression will fail to map the marked offset to a marked address.
To fix, use addresses directly:
POIN 0x0800ABCD | 0x80000000 // Anti-huffman marked pointer to address 0x0800ABCD.

The "SetSymbol" macro

This is something that I used to do quite a bit. Because there was no symbol assignment in ColorzCore, I used this hack to be able to give symbolic names to constants that respected local scopes.
One would define a macro like so:
#define SetSymbol(name, value) "PUSH ; ORG (value) ; name : ; POP"

And use it like this, for example:
MyEventScript:
{
    SetSymbol(my_label_1, 1)
    SetSymbol(my_label_2, 2)

    // ...
LABEL my_label_1
    // ...
LABEL my_label_2
    // ...
    ENDA
}

Since labels are now addresses, this is now dangerous, as the values assigned to these "symbols" aren't the values given to the macro. (for example, my_label_1 is the above example would actually hold the value 0x08000001).
Of course, one can easily remedy this by using symbol assignment. This is what it tried to emulate, after all.
A new warning was introduced that detects uses of such macros.
ASSERT with CURRENTOFFSET and an offset

Consider the following statement:
ASSERT UpperBound - CURRENTOFFSET

If UpperBound is a ROM offset, this will now certainly always fail. This is because CURRENTOFFSET is now an address (larger than at least 0x08000000).
In order to not break existing scripts too hard, the assertion failure that would be raised has been replaced by a simple warning, but only for assertions following this specific form.
Fixing this is easy: make UpperBound an address. One could also use the new conditional operators instead of subtraction to make this clearer (however, this means that the "overflow" won't be printed as part of the assertion failure message).
If you need to keep compatibility with older versions of ColorzCore, you can restrict CURRENTOFFSET by using the bitwise AND or the modulus operator:
ASSERT UpperBound - (CURRENTOFFSET & 0x1FFFFFF)

Maybe define a friendlier "OffsetOf" macro to make this nicer to read. You can also do the conversion the other way, defining a "AddressOf" macro that does OR with 0x08000000.
A quick guide to upgrading safely (using 7-Zip and HxD)

Because there are some changes that will break existing buildfiles, here is a quick guide to locating and fixing any such breakage. I will be doing this process on Pokemblem.
We will be using 7-Zip for its ability to easily generate SHA256 digests and check against them. On can of course do the same thing using another equivalent tool, such as the sha256sum command on Linux.
First, do build your binary using your working ColorzCore version. Consider building using the --nocash-sym option to generate a symbol map you can refer to in case you need to locate some problematic bytes.
Then, using the 7-Zip submenu when right-clicking the produced ROM (on Windows 11, might want to click "Show More Options" or whatever it is called), generate a SHA256 digest of your built ROM. Then back up your ROM and symbol map somewhere that won't get overridden by subsequent builds.

Now, you can replace your ColorzCore binary with a new one!
Build you binary using the new ColorzCore and keep an eye on any warnings (or, if you're unlucky, errors) you may get. Almost all known breaking changes are also diagnosed as a warning, so if you can address these warnings you are already likely to fix most if not all breakages that may have been introduced.

Note: all "path is not portable" warnings can be ignored for now, as they don't actually affect the build. You can disable them by changing your command by adding the --warnings:no-nonportable-pathnames switch. You should consider fixing them eventually though!
Once you have fixed your warnings, verify that the built ROM is the same as the one from before you updated: using the 7-Zip submenu, this time right-clicking the .sha256 file, click verify archive to compare your ROM to the digest of the original. If it tells you there are no problems: you're gucci. Otherwise, we do step 2.

Step 2: Compare your two binaries using a tool that can do so. I will use HxD for the example, but a tool such as the cmp command will also work if you know how to use it.
In HxD, open your new and old ROM and select Whatever -> Compare. It will jump to the first different byte in the ROM. You now need to track where that comes from. You should have a .SYM file available that lists the addresses of all your labels. Once you have located the problematic bit, fix it and rebuild. HxD automatically reloads binaries when they are changed, but you may need to reset the compare. Repeat until you don't have differences anymore.

Not all changes are actually problematic. It turns out that all causes for byte changes in Pokemblem (the projet I am testing) fall into one of these three categories:

A change in which label used in POIN, because some labels part of a local scope shadow a global label. Because of a bug in previous versions of ColorzCore, they were only visible after their definition.
POIN <RAM address>, always from within lyn generated events.
In Patches\ChapterTransition\Music\music installer.event, one of the offsets given to VoiceDirect is incorrect (0x04C4758C, an extra C at the end). Because this is not a ROM offset it would not be translated to an address, where it would be before.

If you find something that broke that was not documented, please let me (stan/nat) know so I can add it in this document.