First, the results from the earlier test also occur with just returning immediately in J3DGDSetChanCtrl
; the patch to GXSetChanCtrl
is not needed. See https://www.youtube.com/watch?v=SdcTgENrxwo&t=135s (I've discovered that my TV has a video out jack, so I can record without latency; however, for whatever reason the captured video has a lot of dropped frames. This might just be my capture device being terrible though.) (The returning immediately is done by writing 4e800020
to 802f3630 in memory/0030e570 in the disc image; the new file has a SHA-1 of 14c10787f0ace2e84d01a3149130474d5d5ae56b or a MD5 of 6364a24233170e505834b7082f18fa8d.)
The main thing I experimented with was more fine-grained testing of the different channels in J3DGDSetChanCtrl
.
The following code is probably not important™ and thus, it's free real estate; we can replace it as needed hopefully without side effects.
We want to use lives, located at 80578a04 (dynamically placed there, but it seems consistent enough for cheat codes to use it, so...). The benefit of using lives is that if there's one thing that's easy to do in secret stages, it's dying. This is done via 0x80578a04 = 0x80580000 - 0x10000 + 8A04 = -0x7FA80000 - 0x75FC when unsigned (this is what the compiler usually generates; if there's a less awkward way of doing it I'm not aware of one)
Lots of fiddling and code that doesn't work, describing how I got to the code that does. Byte values or instructions may be wrong, as these were all only partially tested.
Also, we need a register to put this stuff in. r12 is probably fine since it's volatile by convention, but if I just ignore the diffusefunc = 0 bit, both r3 and r0 are already free for use.
Here's what I want to happen:
Lives | 0 | 1 | 2 | 3 |
Chan0 | Y | N | N | N |
Chan1 | N | Y | N | N |
Chan2 | N | N | Y | N |
Chan3 | N | N | N | Y |
Chan4 | Y1| N | Y2| N |
Chan5 | N | Y1| N | Y2|
Everything but Y2 is handled by (chan & 3) == lives.
Temp code:
802f3634 3d 80 80 58 lis r12,-0x7fa8 ; r12 contains 80580000
802f3648 81 8c 8a 04 lwz r12,-0x75fc(r12) ; r12 contains value at 80578a04 (# lives)
802f364c 7c 03 60 40 cmplw r3,r12 ; compare channel with lives
Temp code 2:
802f3634 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04 lwz r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)
802f36c0 57 40 07 be rlwinm r0,r26,0x0,0x1e,0x1f ; r0 = r26 & 3, or r0 = chan & 3, i.e. treat 4/5 as 0/1
802f3684 7d 83 00 51 subf. r12,r3,r0 ; r12 = r3 - r0; update condition flags (i.e. compare r3 and r0, but also store things for later)
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if r3 = r0
; r27 holds the value to write.
Temp code 3, doesn't save any instructions, though it does save r0 (not helpful); bytes may be wrong?:
802f3634 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04 lwz r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)
802f3684 7d 83 00 51 subf r3,r3,r26 ; r3 = r26 - r3; i.e. r3 is now lives - chan
802f36c0 57 40 07 be rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - lives) & 3, i.e. treat 4/5 as 0/1
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 == 0, i.e. chan & 3 == lives & 3
802f36b8 73 7b ff fd andi. r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)
Temp code 4, doesn't save any instructions, though it does save r0 (not helpful):
802f3648 7c 9f 23 78 or r31,enablelighting,enablelighting ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc rlwinm r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use
802f3634 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04 lwz r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)
802f3684 7c 63 d0 51 subf r3,r3,r26 ; r3 = r26 - r3; i.e. r3 is now chan - lives
802f36c0 54 6c 07 bf rlwinm. r12,r3,0x0,0x1e,0x1f ; r12 = r3 & 3, or r12 = (chan - lives) & 3, i.e. treat 4/5 as 0/1
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 == 0, i.e. chan & 3 == lives & 3
802f36b8 73 7b ff fd andi. r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)
802f37a8 7f 9c fb 78 or r28,r28,r31 ; Re-add saved enablelighting bit. r28 now contains the value that's written (r28 = r27 masked to a byte for some reason).
802f37ac 2c 0c 00 02 cmpwi r12,0x2 ; (chan - lives) & 3 == 2?
802f37b0 40 82 00 08 bne LAB_802f37b8 ; skip if not
802f37b4 73 9c ff fd andi. r28,r28,0xfffd ; clear again
802f37b8 60 00 00 00 ori r0,r0,0x0 ; nop
802f37bc 60 00 00 00 ori r0,r0,0x0 ; nop
Temp code 5, doesn't save any instructions, though it does save r0 (not helpful):
802f3648 7c 9f 23 78 or r31,enablelighting,enablelighting ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc rlwinm r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use
802f3634 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04 lwz r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)
802f3680 7d 83 d0 50 subf r12,r3,r26 ; r12 = r26 - r3; i.e. r12 is now chan - lives
802f3684 55 83 07 bf rlwinm. r3,r12,0x0,0x1e,0x1f ; r3 = r12 & 3, but this is only done for comparison, i.e. treat 4/5 as 0/1, looking for (chan - lives) & 3 == 0
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 == 0, i.e. lives & 3 == chan & 3
802f36b8 73 7b ff fd andi. r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)
802f37a8 7f 9c fb 78 or r28,r28,r31 ; Re-add saved enablelighting bit. r28 now contains the value that's written (r28 = r27 masked to a byte for some reason).
802f37ac 39 8c ff fe subi r12,r12,0x2 ; r12 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r12 = (2 - lives) or (3 - lives)
802f36c0 55 8c 07 bf rlwinm. r12,r12,0x0,0x1e,0x1f ; r12 = r3 & 3, or r12 = (chan - 2 - lives) & 3
802f37b4 40 82 00 08 bne LAB_802f37bc ; Skip if (chan - 2 - lives) & 3 != 0. For chan == 4, this tests (2 - lives) & 3 != 0, i.e. (lives & 3) != 2. Similar for chan == 5.
802f37b4 73 9c ff fd andi. r28,r28,0xfffd ; clear again
802f37b8 60 00 00 00 ori r0,r0,0x0 ; nop
Well, that's sufficiently golfed to fit, but decompiles in a confusing way. And I still have a spare instruction which needs to be replaced anyways.
802f3648 7c 9f 23 78 or r31,r4,r4 ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc rlwinm r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use
802f3660 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04 lwz r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)
802f3680 7c 6c d0 50 subf r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 73 7b ff fd andi. r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)
802f37a8 7f 9c fb 78 or r28,r28,r31 ; Re-add saved enablelighting bit, in case we cleared it before. (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe subi r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50 subf r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08 bne LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0. For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 73 9c ff fd andi. r28,r28,0xfffd ; clear enablelighting bit
Actually, andc
is probably better than andi.
here too...
802f3648 7c 9f 23 78 or r31,r4,r4 ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc rlwinm r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use
802f3660 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04 lwz r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)
802f3680 7c 6c d0 50 subf r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b f8 78 andc r27,r27,r31 ; clear enablelighting bit if set (r27 holds value to write)
802f37a8 7f 9c fb 78 or r28,r28,r31 ; Re-add saved enablelighting bit, in case we cleared it before. (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe subi r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50 subf r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08 bne LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0. For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c f8 78 andc r28,r28,r31 ; clear enablelighting bit if set
... and after all that, it turns out r31 is actually used... but r25 isn't. OK, fine.
802f3648 7c 99 23 78 or r25,r4,r4 ; save a copy of enablelighting in r25
802f364c 57 39 0d fc rlwinm r25,r25,0x1,0x17,0x1e ; and shift it left 1 for later use
802f3660 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04 lwz r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)
802f3680 7c 6c d0 50 subf r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b c8 78 andc r27,r27,r25 ; clear enablelighting bit if set (r27 holds value to write)
802f37a8 7f 9c cb 78 or r28,r28,r25 ; Re-add saved enablelighting bit, in case we cleared it before. (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe subi r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50 subf r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08 bne LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0. For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c c8 78 andc r28,r28,r25 ; clear enablelighting bit if set
802f3648/0030e588: 7c992378 57390dfc
802f3660/0030e5a0: 3c608058 81838a04
802f3680/0030e5c0: 7c6cd050 546307bf
802f36b4/0030e5f4: 40820008 7f7bc878
802f37a8/0030e6e8: 7f9ccb78387afffe7c6c1850546307bf408200087f9cc878
The patched disc image has a SHA-1 of 7fd58c4f3d43c563a835d50d8ecd8ee37b1a2809 or an MD5 of 221a0a2262f711f071f10b693ca57e3d.
... and the game fails to get to the title screen. Simplifying my changes to just this (using andi. and not caring about chan=4 or 5):
802f3660/0030e5a0: 3c608058 81838a04
802f3680/0030e5c0: 7c6cd050 546307bf
802f36b4/0030e5f4: 40820008 737bfffd
makes things work (though I don't like not caring about chan=4/5), but just 802f3648/0030e588: 7c992378 57390dfc
breaks it. So r25 is important, or the diffusefunc = 0 case is important. (Note that this is still in Dolphin). The game boots when I nop
out those addresses, so I guess it must be r25. (Though, a breakpoint at 802f364c is hit on startup, so the diffusefunc = 0 case is triggered, but diffusefunc was already 0 then.) Huh. But both callers of J3DGDSetChanCtrl
load a new value for r25 after the call, so that doesn't make sense. It fails on console too, it seems.
... oh, no, both of the callers of J3DGDSetChanCtrl
has the call in a loop, and they re-use r25 each time in it. I get similar hangs using r24, though I don't see where it's used. r26 is saved by J3DGDSetChanCtrl
, but I don't see other functions saving r25/r24 before they use them; I don't know what the deal with that is. ... oh! It's stmw
, which doesn't store just 1 register, but a bunch of them. So... I guess I can do this all properly, though it requires a bit of messing about with the neighboring instructions too. I'm not sure I've done it right.
EDIT: I previously said that the following was the final assembly, but it doesn't match what I actually used in practice (and also doesn't fully replace r12 with r24, and the bytes aren't correct for at least one instruction):
802f363c 94 21 ff b8 stwu r1,-0x48(r1) ; make more room
802f3640 bf 01 00 20 stmw r24,0x20(r1) ; save r24, r25 in addition to r26+
802f3648 7c 99 23 78 or r25,r4,r4 ; save a copy of enablelighting in r25
802f364c 57 39 0d fc rlwinm r25,r25,0x1,0x17,0x1e ; and shift it left 1 for later use
802f3660 3c 60 80 58 lis r3,-0x7fa8 ; r3 contains 80580000
802f3664 83 03 8a 04 lwz r24,-0x75fc(r3) ; r24 contains value at 80578a04 (# lives)
802f3680 7c 78 c0 50 subf r3,r24,r24 ; r3 = r26 - r24; i.e. r3 is now chan - lives
802f3684 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1
802f36b4 40 82 00 08 bne LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b c8 78 andc r27,r27,r25 ; clear enablelighting bit if set (r27 holds value to write)
802f37a8 7f 9c cb 78 or r28,r28,r25 ; Re-add saved enablelighting bit, in case we cleared it before. (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe subi r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 78 18 50 subf r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf rlwinm. r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08 bne LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0. For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c c8 78 andc r28,r28,r25 ; clear enablelighting bit if set
802f3888 bb 01 00 20 lmw r24,0x20(r1) ; restore r24, r25, in addition to r26+
802f388c 80 01 00 4c lwz r0,0x4c(r1)
802f3890 38 21 00 48 addi r1,r1,0x48
What I actually used still had r12 for lives, leaving r24 unused (even though I saved it). Oops.
When patched this way, the disc image has a SHA-1 of 645ced238d843211a0ba927aefeaa81aaa0e90f5 and a MD5 of fe8f06006038d7190d6bc22dd5e17402.
The main thing to note is that no debug cubes show up in any of the configurations. So the software renderer patch I made earlier isn't correct; enablelighting
seems to be unrelated to when the debug cubes show up. On the other hand, note that when xfmem.color[0]
has lighting disabled, the platforms no longer have shading as they flip, and when xfmem.color[1]
has lightind disabled, Mario and the coin switch are bright, and coins are pure white. Note also that on collecting a 1-up (which should have switched it from xfmem.color[1]
to xfmem.alpha[0]
being unlit), Mario's hat no longer is bright, but his body is, and the coins are still white. I don't know why that is, but it does also happen in Dolphin so I'm not too worried about it. I couldn't see any changes for xfmem.alpha[0]
or xfmem.alpha[1]
.