Pokechu22/DebugCubeExperiments.md

## DebugCubeExperiments.md

      
    Raw
  

              DebugCubeExperiments.md
            
          
    This started as a random patch to Dolphin's software renderer that fixed the "debug cubes" in Super Mario Sunshine (issue 8059) without causing the bridge in a custom Mario Kart Wii track to disappear (issue 12379) or breaking the reflections in one Super Monkey Ball stage (issue 12366).  The patch itself seems to be wrong on later testing, but I'm still using this gist to document my research here.
Note that this is mostly just personal notes, and there's no guarantee that anything is correct (much of this is speculation that I later prove wrong).  I've avoided editing older comments to correct them, and there may be silly mistakes there that I've since found (usually noted in the next update).

  
## explanation.md

      
    Raw
  

              explanation.md
            
          
    This is somewhat hacky, but seems to remove the Super Mario Sunshine debug cubes while still rendering the bridge in the custom Mario Kart Wii track (#12379) and the Monkey Ball reflections (#12366).  The patch is for the software renderer only (which also did not render the debug cubes prior to d6ce8ee, unlike the hardware renderer which did render them prior to 51724c1).
I have not hardware tested this at all, nor am I sure if this even makes sense, but here are the values of xfmem.color[0], xfmem.color[1], xfmem.alpha[0], and xfmem.alpha[1]:
The debug cubes are object 248, the bridge is object 31, and the reflection seems to be object 617/618 on frame 1 (though the reflection is harder to isolate since it doesn't seem to render when objects before it are skipped).
xfmem.color[0] values

Field Mario Sunshine Mario Kart Monkey Ball
Hex 70f 702 706
matsource true false false
enablelighting true true true
lightMask0_3 3 0 1
ambsource false false false
diffusefunc CLAMP (2) CLAMP (2) CLAMP (2)
attnfunc SPOT (3) SPOT (3) SPOT (3)
lightMask4_7 0 0 0

xfmem.color[1] values

Field Mario Sunshine Mario Kart Monkey Ball
Hex 202 000 401
matsource false false true
enablelighting true false false
lightMask0_3 0 0 0
ambsource false false false
diffusefunc NONE (0) NONE (0) NONE
attnfunc SPEC (1) NONE (0) DIR (2)
lightMask4_7 0 0 0

xfmem.alpha[0] values

Field Mario Sunshine Mario Kart Monkey Ball
Hex 681 701 441
matsource true true true
enablelighting false false false
lightMask0_3 0 0 0
ambsource false false true
diffusefunc SIGN (1) CLAMP (2) NONE (0)
attnfunc SPOT (3) SPOT (3) DIR (2)
lightMask4_7 0 0 0

xfmem.alpha[1] values

Field Mario Sunshine Mario Kart Monkey Ball
Hex 400 0 401
matsource false false true
enablelighting false false false
lightMask0_3 0 0 0
ambsource false false false
diffusefunc NONE (0) NONE (0) NONE
attnfunc DIR (2) NONE (0) DIR (2)
lightMask4_7 0 0 0


## SWVertexLoader.diff
diff --git a/Source/Core/VideoBackends/Software/SWVertexLoader.cpp b/Source/Core/VideoBackends/Software/SWVertexLoader.cpp
index 380d6dda0d..95c026c8d7 100644
--- a/Source/Core/VideoBackends/Software/SWVertexLoader.cpp
+++ b/Source/Core/VideoBackends/Software/SWVertexLoader.cpp
@@ -186,10 +186,7 @@ static void ParseColorAttributes(InputVertexData* dst, DataReader& src,
                                  const PortableVertexDeclaration& vdec)
 {
   const auto set_default_color = [](u8* color, int i) {
-    // The default alpha channel seems to depend on the number of components in the vertex format.
-    const auto& g0 = g_main_cp_state.vtx_attr[g_main_cp_state.last_id].g0;
-    const u32 color_elements = i == 0 ? g0.Color0Elements : g0.Color1Elements;
-    color[0] = color_elements == 0 ? 255 : 0;
+    color[0] = (xfmem.color[0].enablelighting && xfmem.color[1].enablelighting) ? 0 : 255;
     color[1] = 255;
     color[2] = 255;
     color[3] = 255;

## update1.md

      
    Raw
  

              update1.md
            
          
    By writing 7c6b1b7841800008396bfffc2c0b00014082000c388000006000000060000000 to memory at 8035f6e8 (on the disc image, at 0037a628), the following instructions are written:
8035f6e8: 7c6b1b78 or r11,r13,r3
8035f6ec: 41800008 blt ...
8035f6f0: 396bfffc subi r11,r11,4
8035f6f4: 2c0b0001 cmpwi r11, 1
8035f6f8: 4082000c unchanged bne
8035f6fc: 38800000 li r4, 0 ; disable lighting
8035f700: 60000000 nop
8035f704: 60000000 nop

(at 8035f6d4, there is an unchanged cmpwi r3, 4)
This changes GXSetChanCtrl (at 8035f6d0), replacing the following:
  if (chan == 4) {
    chanToUse = 0;
  }
  else {
    chanToUse = chan;
    if (chan == 5) {
      chanToUse = 1;
    }
  }

with this:
  chanToUse = chan;
  if (chan >= 4) {
    chanToUse = chan - 4;
  }
  if (chanToUse == 1) {
    enablelighting = 0;
  }

I.e., it maintains the behavior of chanToUse, but disables lighting for xfmem.color[1] (and sometimes xfmem.alpha[1], as 4/5 indicate to update both of them).
This change actually did not cause debug cubes to show up in the software renderer with my patch; nor did they show up on console.  More surprisingly, there still was a write to xfmem.color[1] with lighting enabled (incidentally, there are multiple writes to it during the drawing of the debug cubes, something I did not realize before.  The tables I previously made are not that meaningful it seems).  That write was from JSystem::J3DGDSetChanCtrl (at 802f3630), which is only called by 2 things (GXSetChanCtrl is used by 137, and GDSetChanCtrl at 80366be8 is also used by 2, but doesn't seem to be hit in the area with the debug cubes).  J3DGDSetChanCtrl's 2 users are J3DColorBlockLightOn::load and J3DColorBlockLightOff::load.  If I combine the previous change with a blr at the start of J3DGDSetChanCtrl (too lazy to try and save space to insert a write), then the world looks a bit crazy, but more importantly the debug cubes start showing up in the software renderer with my patch.  (Write 4e800020 to 802f3630 in memory/0030e570 on the disc image).  Even more importantly, they show up on console, too!  Here's what it loks like: https://www.youtube.com/watch?v=Wt6fFUOzQTk&t=274s
Further testing is needed to isolate and verify exactly what I've done.

  
## update2.md

      
    Raw
  

              update2.md
            
          
    First, the results from the earlier test also occur with just returning immediately in J3DGDSetChanCtrl; the patch to GXSetChanCtrl is not needed.  See https://www.youtube.com/watch?v=SdcTgENrxwo&t=135s (I've discovered that my TV has a video out jack, so I can record without latency; however, for whatever reason the captured video has a lot of dropped frames.  This might just be my capture device being terrible though.)  (The returning immediately is done by writing 4e800020 to 802f3630 in memory/0030e570 in the disc image; the new file has a SHA-1 of 14c10787f0ace2e84d01a3149130474d5d5ae56b or a MD5 of 6364a24233170e505834b7082f18fa8d.)
The main thing I experimented with was more fine-grained testing of the different channels in J3DGDSetChanCtrl.
The following code is probably not important™ and thus, it's free real estate; we can replace it as needed hopefully without side effects.
  if (attnfunc == 0) { // r9
    diffusefunc = 0; // r8
  }
  // ... setup stuff ...
  if (__GDCurrentDL->end < __GDCurrentDL->cur + 9) {
    GDOverflowed();
  }
  // ... Writes to xfmem.color or xfmem.alpha
  if (chan - 4 < 2) { // this if must remain; chan is r3 initially but moved to r26
    if (__GDCurrentDL->end < __GDCurrentDL->cur + 9) {
      GDOverflowed();
    }
    // ... Writes to xfmem.alpha in addition to earlier writes to xfmem.color
  }

implemented as
802f3634 2c 09 00 00     cmpwi      r9,0x0

802f3648 40 82 00 08     bne        LAB_802f3650
802f364c 39 00 00 00     li         r8,0x0

802f3660 80 6a 00 08     lwz        r3,0x8(r10) ; chan no longer in r3
802f3664 80 0a 00 0c     lwz        r0,0xc(r10)

802f3680 38 63 00 09     addi       r3,r3,0x9
802f3684 7c 03 00 40     cmplw      r3,r0

802f36b4 40 81 00 08     ble        LAB_802f36bc
802f36b8 48 07 11 7d     bl         gd::GDOverflowed

802f37a8 80 64 00 08     lwz        r3,0x8(r4)
802f37ac 80 04 00 0c     lwz        r0,0xc(r4)
802f37b0 38 63 00 09     addi       r3,r3,0x9
802f37b4 7c 03 00 40     cmplw      r3,r0
802f37b8 40 81 00 08     ble        LAB_802f37c0
802f37bc 48 07 10 79     bl         gd::GDOverflowed

We want to use lives, located at 80578a04 (dynamically placed there, but it seems consistent enough for cheat codes to use it, so...).  The benefit of using lives is that if there's one thing that's easy to do in secret stages, it's dying.  This is done via 0x80578a04 = 0x80580000 - 0x10000 + 8A04 = -0x7FA80000 - 0x75FC when unsigned (this is what the compiler usually generates; if there's a less awkward way of doing it I'm not aware of one)
Lots of fiddling and code that doesn't work, describing how I got to the code that does.  Byte values or instructions may be wrong, as these were all only partially tested.
Also, we need a register to put this stuff in.  r12 is probably fine since it's volatile by convention, but if I just ignore the diffusefunc = 0 bit, both r3 and r0 are already free for use.
Here's what I want to happen:
Lives | 0 | 1 | 2 | 3 |
Chan0 | Y | N | N | N |
Chan1 | N | Y | N | N |
Chan2 | N | N | Y | N |
Chan3 | N | N | N | Y |
Chan4 | Y1| N | Y2| N |
Chan5 | N | Y1| N | Y2|

Everything but Y2 is handled by (chan & 3) == lives.
Temp code:
802f3634 3d 80 80 58     lis        r12,-0x7fa8 ; r12 contains 80580000
802f3648 81 8c 8a 04     lwz        r12,-0x75fc(r12) ; r12 contains value at 80578a04 (# lives)
802f364c 7c 03 60 40     cmplw      r3,r12 ; compare channel with lives

Temp code 2:
802f3634 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04     lwz        r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)

802f36c0 57 40 07 be     rlwinm     r0,r26,0x0,0x1e,0x1f ; r0 = r26 & 3, or r0 = chan & 3, i.e. treat 4/5 as 0/1
802f3684 7d 83 00 51     subf.      r12,r3,r0 ; r12 = r3 - r0; update condition flags (i.e. compare r3 and r0, but also store things for later)

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if r3 = r0
; r27 holds the value to write.

Temp code 3, doesn't save any instructions, though it does save r0 (not helpful); bytes may be wrong?:
802f3634 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04     lwz        r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)

802f3684 7d 83 00 51     subf       r3,r3,r26 ; r3 = r26 - r3; i.e. r3 is now lives - chan
802f36c0 57 40 07 be     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - lives) & 3, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 == 0, i.e. chan & 3 == lives & 3
802f36b8 73 7b ff fd     andi.      r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)

Temp code 4, doesn't save any instructions, though it does save r0 (not helpful):
802f3648 7c 9f 23 78     or         r31,enablelighting,enablelighting ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc     rlwinm     r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3634 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04     lwz        r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)

802f3684 7c 63 d0 51     subf       r3,r3,r26 ; r3 = r26 - r3; i.e. r3 is now chan - lives
802f36c0 54 6c 07 bf     rlwinm.    r12,r3,0x0,0x1e,0x1f ; r12 = r3 & 3, or r12 = (chan - lives) & 3, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 == 0, i.e. chan & 3 == lives & 3
802f36b8 73 7b ff fd     andi.      r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)

802f37a8 7f 9c fb 78     or         r28,r28,r31 ; Re-add saved enablelighting bit.  r28 now contains the value that's written (r28 = r27 masked to a byte for some reason).
802f37ac 2c 0c 00 02     cmpwi      r12,0x2 ; (chan - lives) & 3 == 2?
802f37b0 40 82 00 08     bne        LAB_802f37b8 ; skip if not
802f37b4 73 9c ff fd     andi.      r28,r28,0xfffd ; clear again
802f37b8 60 00 00 00     ori        r0,r0,0x0 ; nop
802f37bc 60 00 00 00     ori        r0,r0,0x0 ; nop

Temp code 5, doesn't save any instructions, though it does save r0 (not helpful):
802f3648 7c 9f 23 78     or         r31,enablelighting,enablelighting ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc     rlwinm     r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3634 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3648 80 63 8a 04     lwz        r3,-0x75fc(r3) ; r3 contains value at 80578a04 (# lives)

802f3680 7d 83 d0 50     subf       r12,r3,r26 ; r12 = r26 - r3; i.e. r12 is now chan - lives
802f3684 55 83 07 bf     rlwinm.    r3,r12,0x0,0x1e,0x1f ; r3 = r12 & 3, but this is only done for comparison, i.e. treat 4/5 as 0/1, looking for (chan - lives) & 3 == 0

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 == 0, i.e. lives & 3 == chan & 3
802f36b8 73 7b ff fd     andi.      r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)

802f37a8 7f 9c fb 78     or         r28,r28,r31 ; Re-add saved enablelighting bit.  r28 now contains the value that's written (r28 = r27 masked to a byte for some reason).
802f37ac 39 8c ff fe     subi       r12,r12,0x2 ; r12 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r12 = (2 - lives) or (3 - lives)
802f36c0 55 8c 07 bf     rlwinm.    r12,r12,0x0,0x1e,0x1f ; r12 = r3 & 3, or r12 = (chan - 2 - lives) & 3
802f37b4 40 82 00 08     bne        LAB_802f37bc ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests (2 - lives) & 3 != 0, i.e. (lives & 3) != 2.  Similar for chan == 5.
802f37b4 73 9c ff fd     andi.      r28,r28,0xfffd ; clear again
802f37b8 60 00 00 00     ori        r0,r0,0x0 ; nop

Well, that's sufficiently golfed to fit, but decompiles in a confusing way.  And I still have a spare instruction which needs to be replaced anyways.
802f3648 7c 9f 23 78     or         r31,r4,r4 ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc     rlwinm     r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3660 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04     lwz        r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)

802f3680 7c 6c d0 50     subf       r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 73 7b ff fd     andi.      r27,r27,0xfffd ; clear enablelighting bit (r27 holds value to write)

802f37a8 7f 9c fb 78     or         r28,r28,r31 ; Re-add saved enablelighting bit, in case we cleared it before.  (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe     subi       r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50     subf       r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08     bne        LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 73 9c ff fd     andi.      r28,r28,0xfffd ; clear enablelighting bit

Actually, andc is probably better than andi. here too...
802f3648 7c 9f 23 78     or         r31,r4,r4 ; save a copy of enablelighting in r31 (seemingly safe?)
802f364c 57 ff 0d fc     rlwinm     r31,r31,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3660 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04     lwz        r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)

802f3680 7c 6c d0 50     subf       r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b f8 78     andc       r27,r27,r31 ; clear enablelighting bit if set (r27 holds value to write)

802f37a8 7f 9c fb 78     or         r28,r28,r31 ; Re-add saved enablelighting bit, in case we cleared it before.  (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe     subi       r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50     subf       r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08     bne        LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c f8 78     andc       r28,r28,r31 ; clear enablelighting bit if set

... and after all that, it turns out r31 is actually used... but r25 isn't.  OK, fine.
802f3648 7c 99 23 78     or         r25,r4,r4 ; save a copy of enablelighting in r25
802f364c 57 39 0d fc     rlwinm     r25,r25,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3660 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04     lwz        r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)

802f3680 7c 6c d0 50     subf       r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b c8 78     andc       r27,r27,r25 ; clear enablelighting bit if set (r27 holds value to write)

802f37a8 7f 9c cb 78     or         r28,r28,r25 ; Re-add saved enablelighting bit, in case we cleared it before.  (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe     subi       r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50     subf       r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08     bne        LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c c8 78     andc       r28,r28,r25 ; clear enablelighting bit if set

802f3648/0030e588: 7c992378 57390dfc
802f3660/0030e5a0: 3c608058 81838a04
802f3680/0030e5c0: 7c6cd050 546307bf
802f36b4/0030e5f4: 40820008 7f7bc878
802f37a8/0030e6e8: 7f9ccb78387afffe7c6c1850546307bf408200087f9cc878

The patched disc image has a SHA-1 of 7fd58c4f3d43c563a835d50d8ecd8ee37b1a2809 or an MD5 of 221a0a2262f711f071f10b693ca57e3d.
... and the game fails to get to the title screen.  Simplifying my changes to just this (using andi. and not caring about chan=4 or 5):
802f3660/0030e5a0: 3c608058 81838a04
802f3680/0030e5c0: 7c6cd050 546307bf
802f36b4/0030e5f4: 40820008 737bfffd

makes things work (though I don't like not caring about chan=4/5), but just 802f3648/0030e588: 7c992378 57390dfc breaks it.  So r25 is important, or the diffusefunc = 0 case is important.  (Note that this is still in Dolphin).  The game boots when I nop out those addresses, so I guess it must be r25.  (Though, a breakpoint at 802f364c is hit on startup, so the diffusefunc = 0 case is triggered, but diffusefunc was already 0 then.)  Huh.  But both callers of J3DGDSetChanCtrl load a new value for r25 after the call, so that doesn't make sense.  It fails on console too, it seems.
... oh, no, both of the callers of J3DGDSetChanCtrl has the call in a loop, and they re-use r25 each time in it.  I get similar hangs using r24, though I don't see where it's used.  r26 is saved by J3DGDSetChanCtrl, but I don't see other functions saving r25/r24 before they use them; I don't know what the deal with that is.  ... oh!  It's stmw, which doesn't store just 1 register, but a bunch of them.  So... I guess I can do this all properly, though it requires a bit of messing about with the neighboring instructions too.  I'm not sure I've done it right.
EDIT: I previously said that the following was the final assembly, but it doesn't match what I actually used in practice (and also doesn't fully replace r12 with r24, and the bytes aren't correct for at least one instruction):
802f363c 94 21 ff b8     stwu       r1,-0x48(r1) ; make more room
802f3640 bf 01 00 20     stmw       r24,0x20(r1) ; save r24, r25 in addition to r26+

802f3648 7c 99 23 78     or         r25,r4,r4 ; save a copy of enablelighting in r25
802f364c 57 39 0d fc     rlwinm     r25,r25,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3660 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3664 83 03 8a 04     lwz        r24,-0x75fc(r3) ; r24 contains value at 80578a04 (# lives)

802f3680 7c 78 c0 50     subf       r3,r24,r24 ; r3 = r26 - r24; i.e. r3 is now chan - lives
802f3684 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b c8 78     andc       r27,r27,r25 ; clear enablelighting bit if set (r27 holds value to write)

802f37a8 7f 9c cb 78     or         r28,r28,r25 ; Re-add saved enablelighting bit, in case we cleared it before.  (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe     subi       r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 78 18 50     subf       r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08     bne        LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c c8 78     andc       r28,r28,r25 ; clear enablelighting bit if set

802f3888 bb 01 00 20     lmw        r24,0x20(r1) ; restore r24, r25, in addition to r26+
802f388c 80 01 00 4c     lwz        r0,0x4c(r1)
802f3890 38 21 00 48     addi       r1,r1,0x48

What I actually used still had r12 for lives, leaving r24 unused (even though I saved it).  Oops.

The final assembly I came up with is this:
802f363c 94 21 ff b8     stwu       r1,-0x48(r1) ; make more room
802f3640 bf 01 00 20     stmw       r24,0x20(r1) ; save r24, r25 in addition to r26+

802f3648 7c 99 23 78     or         r25,r4,r4 ; save a copy of enablelighting in r25
802f364c 57 39 0d fc     rlwinm     r25,r25,0x1,0x17,0x1e ; and shift it left 1 for later use

802f3660 3c 60 80 58     lis        r3,-0x7fa8 ; r3 contains 80580000
802f3664 81 83 8a 04     lwz        r12,-0x75fc(r3) ; r12 contains value at 80578a04 (# lives)

802f3680 7c 6c d0 50     subf       r3,r12,r26 ; r3 = r26 - r12; i.e. r3 is now chan - lives
802f3684 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b c8 78     andc       r27,r27,r25 ; clear enablelighting bit if set (r27 holds value to write)

802f37a8 7f 9c cb 78     or         r28,r28,r25 ; Re-add saved enablelighting bit, in case we cleared it before.  (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe     subi       r3,r26,0x2 ; r3 = (chan - 2)
802f37b0 7c 6c 18 50     subf       r3,r12,r3 ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08     bne        LAB_802f37c0 ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c c8 78     andc       r28,r28,r25 ; clear enablelighting bit if set

802f3888 bb 01 00 20     lmw        r24,0x20(r1) ; restore r24, r25, in addition to r26+
802f388c 80 01 00 4c     lwz        r0,0x4c(r1)
802f3890 38 21 00 48     addi       r1,r1,0x48

which is equivalent to this:
  enable_saved = enablelighting << 1;
  lives = *((u32 *)0x80578a04);
  // ... setup stuff ...
  if ((chan - lives) & 3 == 0) {
    value = value & ~enable_saved;
  }
  // ... Writes to xfmem.color or xfmem.alpha
  if (chan - 4 < 2) {
    value = value | enable_saved;
    if (((chan - 2) - lives) & 3 == 0) {
      value = value & ~enable_saved;
    }
    // ... Writes to xfmem.alpha in addition to earlier writes to xfmem.color
  }
Here is the relevant data (with corresponding memory and disc image offsets):
802f363c/0030e57c: 9421ffb8bf010020
802f3648/0030e588: 7c99237857390dfc
802f3660/0030e5a0: 3c60805881838a04
802f3680/0030e5c0: 7c6cd050546307bf
802f36b4/0030e5f4: 408200087f7bc878
802f37a8/0030e6e8: 7f9ccb78387afffe7c6c1850546307bf408200087f9cc878
802f3888/0030e7c8: bb0100208001004c38210048

When patched this way, the disc image has a SHA-1 of 645ced238d843211a0ba927aefeaa81aaa0e90f5 and a MD5 of fe8f06006038d7190d6bc22dd5e17402.
A video of the results: https://youtu.be/pROkmawIsgU
The main thing to note is that no debug cubes show up in any of the configurations.  So the software renderer patch I made earlier isn't correct; enablelighting seems to be unrelated to when the debug cubes show up.  On the other hand, note that when xfmem.color[0] has lighting disabled, the platforms no longer have shading as they flip, and when xfmem.color[1] has lightind disabled, Mario and the coin switch are bright, and coins are pure white.  Note also that on collecting a 1-up (which should have switched it from xfmem.color[1] to xfmem.alpha[0] being unlit), Mario's hat no longer is bright, but his body is, and the coins are still white.  I don't know why that is, but it does also happen in Dolphin so I'm not too worried about it.  I couldn't see any changes for xfmem.alpha[0] or xfmem.alpha[1].
Since J3DGDSetChanCtrl being disabled did have an effect on debug cubes, but enablelighting wasn't what made them show up, it must be one of the other parameters.  Further research is still needed.

  
## update3.md

      
    Raw
  

              update3.md
            
          
    At around 802a6740-802a6790 there's 21 free instructions (starting with the call to DVDCheckDisk, ending with the unmount) that is called on a reset.  State 9 is an unused level-select menu, which is quite nice (it even sets the number of lives to 3!).  To access the level with the debug cubes select Pianta Village, EX 4.  (I don't know why it's in Pianta Village and not Bianco Hills.)  TCRF's code writes 3bc00009 to 802a6788, replacing the newState = 4 line.
Process of creating the patch
This is the code that will be replaced:
if ((1 << (int)(this->gamePads[0]->field_0x0).field_0x78 & 0xffffU & (uint)_mResetFlag) != 0) {
  _mResetFlag = 0;
  recalibrate((JUTGamePad *)0xf0000000,1);
  diskOK = DVDCheckDisk();
  if (diskOK == 0) {
    newState = 7;
    this->field_0x44 = this->field_0x44 | 2;
  }
  else {
    if ((this->gamestate == 2) || (this->gamestate == 3)) {
      newState = 7;
    }
    else {
      if (newState != 7) {
        newState = 4;
        unmount(gpCardManager);
      }
    }
  }
}

We have free use of r0, r30, r3, r4, and r5 here.  r30 is the new gamestate, so it must be set to 9 at the end.
802a6790 3b c0 00 09     li         r30,0x9 ; set new game state to 9.

We need a place to store a regular variable, but fortunately MSBgm::smMainVolume at the start of the SDA is a float and there are 4 unused bytes after it at 8040c1c4 (r13 - 0x7FFC).  We could put 802a6778 at that address, to reference the table directly.
Initial version: (no actual payload yet, but it probably would work?)
802a6740 80 ad 80 04     lwz        r5,-0x7ffc(r13)=>DAT_8040c1c4
802a6744 38 85 00 04     addi       r4,r5,0x4
802a6748 90 8d 80 04     stw        r4,-0x7ffc(r13)=>DAT_8040c1c4
802a674c 80 85 00 00     lwz        r4,0x0(r5)=>LAB_802a6778
802a6750 3c a0 80 2f     lis        r5,-0x7fd1
802a6754 90 85 36 48     stw        r4,offset LAB_802f3648(r5)
802a6758 60 00 00 00     ori        r0,r0,0x0
802a675c 60 00 00 00     ori        r0,r0,0x0
802a6760 60 00 00 00     ori        r0,r0,0x0
802a6764 60 00 00 00     ori        r0,r0,0x0
802a6768 60 00 00 00     ori        r0,r0,0x0
802a676c 60 00 00 00     ori        r0,r0,0x0
802a6770 60 00 00 00     ori        r0,r0,0x0
802a6774 48 00 00 1c     b          LAB_802a6790
                     LAB_802a6778                                    XREF[1]:     802a674c(R)  
802a6778 60 00 00 00     ori        r0,r0,0x0
802a677c 60 00 00 00     ori        r0,r0,0x0
802a6780 60 00 00 00     ori        r0,r0,0x0
802a6784 60 00 00 00     ori        r0,r0,0x0
802a6788 60 00 00 00     ori        r0,r0,0x0
802a678c 60 00 00 00     ori        r0,r0,0x0
                     LAB_802a6790                                    XREF[1]:     802a6774(j)  
802a6790 3b c0 00 09     li         r30,0x9

New version, still no payload, but DAT_8040c1c4 can remain zero-initialized and it loops; however it starts at 802a6774 instead of 802a6770:
802a6740 80 ad 80 04     lwz        r5,-0x7ffc(r13)=>DAT_8040c1c4
802a6744 38 85 00 04     addi       r4,r5,0x4
802a6748 54 84 06 fe     rlwinm     r4,r4,0x0,0x1b,0x1f
802a674c 90 8d 80 04     stw        r4,-0x7ffc(r13)=>DAT_8040c1c4
802a6750 3c a0 80 2a     lis        r5,-0x7fd6
802a6754 7c a4 2a 14     add        r5,r4,r5
802a6758 80 85 67 70     lwz        r4,offset LAB_802a6770(r5)
802a675c 60 00 00 00     ori        r0,r0,0x0
802a6760 3c a0 80 2f     lis        r5,-0x7fd1
802a6764 94 85 36 48     stwu       r4,offset LAB_802f3648(r5)
802a6768 7c 00 2b ac     dcbi       0,r5
802a676c 48 00 00 24     b          LAB_802a6790
                     LAB_802a6770                                    XREF[1]:     802a6758(R)  
802a6770 60 00 00 00     ori        r0,r0,0x0
802a6774 60 00 00 00     ori        r0,r0,0x0
802a6778 60 00 00 00     ori        r0,r0,0x0
802a677c 60 00 00 00     ori        r0,r0,0x0
802a6780 60 00 00 00     ori        r0,r0,0x0
802a6784 60 00 00 00     ori        r0,r0,0x0
802a6788 60 00 00 00     ori        r0,r0,0x0
802a678c 60 00 00 00     ori        r0,r0,0x0
                     LAB_802a6790                                    XREF[1]:     802a676c(j)  
802a6790 3b c0 00 09     li         r30,0x9

Almost final version, no longer uses stwu because I had a spare instruction anyways; also no longer starts at index 1.  To be annotated and have a payload still.
802a6740 80 ad 80 04     lwz        r5,-0x7ffc(r13)=>UINT_8040c1c4
802a6744 38 85 00 04     addi       r4,r5,0x4
802a6748 54 84 06 fe     rlwinm     r4,r4,0x0,0x1b,0x1f
802a674c 90 8d 80 04     stw        r4,-0x7ffc(r13)=>UINT_8040c1c4
802a6750 3c 80 80 2a     lis        r4,-0x7fd6
802a6754 7c 84 2a 14     add        r4,r4,r5
802a6758 80 a4 67 70     lwz        r5,offset LAB_802a6770(r4)
802a675c 3c 80 80 2f     lis        r4,-0x7fd1
802a6760 90 a4 36 48     stw        r5,offset LAB_802f3648(r4)
802a6764 38 84 36 48     addi       r4,r4,0x3648
802a6768 7c 00 2b ac     dcbi       0,r5
802a676c 48 00 00 24     b          LAB_802a6790
                     LAB_802a6770                                    XREF[1]:     802a6758(R)  
802a6770 60 00 00 00     ori        r0,r0,0x0
802a6774 60 00 00 00     ori        r0,r0,0x0
802a6778 60 00 00 00     ori        r0,r0,0x0
802a677c 60 00 00 00     ori        r0,r0,0x0
802a6780 60 00 00 00     ori        r0,r0,0x0
802a6784 60 00 00 00     ori        r0,r0,0x0
802a6788 60 00 00 00     ori        r0,r0,0x0
802a678c 60 00 00 00     ori        r0,r0,0x0
                     LAB_802a6790                                    XREF[1]:     802a676c(j)  
802a6790 3b c0 00 09     li         r30,0x9

When I did my hardware testing, I used the following changes (based on the final version outside of this collapsed section); these use the previous patch to J3DGDSetChanCtrl that used r12 instead of r24.  There shouldn't be a difference, but I'm noting that just in case.
802a6740/002c1680: 80ad800438850004548406fe908d80043c80802a7c842a1480a467703c80802f90a4364c3884364c7c0027ac4800002454990dfc54b930327cd9337854f916ba54f93c6855193830392000004800023c3bc00009
802f363c/0030e57c: 9421ffb8bf010020
802f3648/0030e588: 3b20000060000000
802f3660/0030e5a0: 3c60805881838a04
802f3680/0030e5c0: 7c6cd050546307bf
802f36b4/0030e5f4: 408200087f7bc878
802f37a8/0030e6e8: 7f9ccb78387afffe7c6c1850546307bf408200087f9cc878
802f3888/0030e7c8: bb0100208001004c38210048

and the disc image should have a SHA-1 of 7f3d1ed78f3f3c51d84b39eaaec951feb7860305 or a MD5 of c858426c3f6225c37fcc41dfebea3c70.

Here's the patch I came up with, with annotations:
802a6740 80 ad 80 04     lwz        r5,-0x7ffc(r13)=>DAT_8040c1c4  ; r5 = counter, loaded from SDA
802a6744 38 85 00 04     addi       r4,r5,0x4                      ; r4 = counter + 4, adding 4 because each instruction is 4 bytes
802a6748 54 84 06 fe     rlwinm     r4,r4,0x0,0x1b,0x1f            ; r4 = (counter + 4) & 0x1f, i.e. wrap around after 8 counts
802a674c 90 8d 80 04     stw        r4,-0x7ffc(r13)=>DAT_8040c1c4  ; Save new counter
802a6750 3c 80 80 2a     lis        r4,-0x7fd6                     ; r4 = 802a0000
802a6754 7c 84 2a 14     add        r4,r4,r5                       ; r4 += r5 (original counter value)
802a6758 80 a4 67 70     lwz        r5,offset DAT_802a6770(r4)     ; r5 = *(r4 + 6770), i.e. r5 = payload_table[counter]
802a675c 3c 80 80 2f     lis        r4,-0x7fd1                     ; r4 = 802f0000
802a6760 90 a4 36 4c     stw        r5,offset LAB_802f364c(r4)     ; *((u32 *)802f364c) = r5, i.e. patch J3DGDSetChanCtrl with payload
802a6764 38 84 36 4c     addi       r4,r4,0x364c                   ; r4 = 802f364c
802a6768 7c 00 27 ac     icbi       0,r4                           ; invalidate icache for patched J3DGDSetChanCtrl (does not need to be aligned)
802a676c 48 00 00 24     b          LAB_802a6790                   ; jump past following table of payloads
802a6770 54 99 0d fc     rlwinm     r25,r4,0x1,0x17,0x1e           ; 1st payload: r25 = enablelighting << 1
802a6774 54 b9 30 32     rlwinm     r25,r5,0x6,0x0,0x19            ; 2nd payload: r25 = ambSource << 6
802a6778 7c d9 33 78     or         r25,r6,r6                      ; 3rd payload: r25 = matsource
802a677c 54 f9 16 ba     rlwinm     r25,r7,0x2,0x1a,0x1d           ; 4th payload: r25 = (lightMask & 0xf) << 2 (lower 4 bits)
802a6780 54 f9 3c 68     rlwinm     r25,r7,0x7,0x11,0x14           ; 5th payload: r25 = ((lightMask << 7) >> 8) & 0x78 (upper 4 bits)
802a6784 55 19 38 30     rlwinm     r25,r8,0x7,0x0,0x18            ; 6th payload: r25 = diffuseFunc << 7
802a6788 39 20 00 00     li         r9,0x0                         ; 7th payload: Set attnFunc to 0 (r25 already zero'd)
802a678c 48 00 02 3c     b          LAB_802f3888                   ; 8th payload: Jump to end of J3DGDSetChanCtrl, basically a return
802a6790 3b c0 00 09     li         r30,0x9                        ; Switch game state to the level select menu

This patch basically is this:
if ((1 << (int)(this->gamePads[0]->field_0x0).field_0x78 & 0xffffU & (uint)_mResetFlag) != 0) {
  _mResetFlag = 0;
  recalibrate((JUTGamePad *)0xf0000000,1);
  *((u32*)0x802f364c) = payload_table[counter]; // patch J3DGDSetChanCtrl
  counter = (counter + 1) & 7;
  instructionCacheBlockInvalidate(0x802f364c); // invalidate icache for J3DGDSetChanCtrl
  newState = 9;
}
Testing found that attnFunc had really weird behavior: 0 gives 200, 1 gives 600, 2 gives 400, 3 gives 600.  Since I don't understand this, the payload is just to always set attnFunc to 0 instead of trying to emulate that in one instruction for masking.  Other than that, payloads are mostly logical.
The patch for J3DGDSetChanCtrl itself is mostly unchanged, only adjusting the 802f3648/802f364c line (which previously set up r25 with the value of enablelighting, and is now overwritten) and fixing a mistake where I used r12 instead of r24.
802f363c 94 21 ff b8     stwu       r1,-0x48(r1) ; make more room
802f3640 bf 01 00 20     stmw       r24,0x20(r1) ; save r24, r25 in addition to r26+

802f3648 3b 20 00 00     li         r25,0x0      ; Clear r25 (saved value that is cleared/reset), most payloads will set it again afterwards
802f364c 60 00 00 00     ori        r0,r0,0x0    ; nop to be replaced

802f3660 3c 60 80 58     lis        r3,-0x7fa8      ; r3 contains 80580000
802f3664 83 03 8a 04     lwz        r24,-0x75fc(r3) ; r24 contains value at 80578a04 (# lives)

802f3680 7c 78 d0 50     subf       r3,r24,r26          ; r3 = r26 - r24; i.e. r3 is now chan - lives
802f3684 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, but this is only done for comparison, not the value of r3, looking for (chan - lives) & 3 == 0, i.e. treat 4/5 as 0/1

802f36b4 40 82 00 08     bne        LAB_802f36bc ; jump past next instruction if (chan - lives) & 3 != 0, i.e. lives & 3 != chan & 3
802f36b8 7f 7b c8 78     andc       r27,r27,r25  ; clear enablelighting bit if set (r27 holds value to write)

802f37a8 7f 9c cb 78     or         r28,r28,r25 ; Re-add saved enablelighting bit, in case we cleared it before.  (r28 now contains the value that's written (r28 = r27 masked to a byte for some reason)).
802f37ac 38 7a ff fe     subi       r3,r26,0x2          ; r3 = (chan - 2)
802f37b0 7c 78 18 50     subf       r3,r24,r3           ; r3 = (chan - 2) - lives; since chan = 4 or chan = 5, we have r3 = (2 - lives) or (3 - lives)
802f37b4 54 63 07 bf     rlwinm.    r3,r3,0x0,0x1e,0x1f ; r3 = r3 & 3, or r3 = (chan - 2 - lives) & 3; again, this is only to check if equal to zero
802f37b8 40 82 00 08     bne        LAB_802f37c0        ; Skip if (chan - 2 - lives) & 3 != 0.  For chan == 4, this tests lives & 3 != 2; for chan == 5, lives & 3 != 3.
802f37bc 7f 9c c8 78     andc       r28,r28,r25         ; clear enablelighting bit if set

802f3888 bb 01 00 20     lmw        r24,0x20(r1) ; restore r24, r25, in addition to r26+
802f388c 80 01 00 4c     lwz        r0,0x4c(r1)
802f3890 38 21 00 48     addi       r1,r1,0x48


Here are the final patches:
802a6740/002c1680: 80ad800438850004548406fe908d80043c80802a7c842a1480a467703c80802f90a4364c3884364c7c0027ac4800002454990dfc54b930327cd9337854f916ba54f93c6855193830392000004800023c3bc00009
802f363c/0030e57c: 9421ffb8bf010020
802f3648/0030e588: 3b20000060000000
802f3660/0030e5a0: 3c60805883038a04
802f3680/0030e5c0: 7c78d050546307bf
802f36b4/0030e5f4: 408200087f7bc878
802f37a8/0030e6e8: 7f9ccb78387afffe7c781850546307bf408200087f9cc878
802f3888/0030e7c8: bb0100208001004c38210048

and the disc image should have a SHA-1 of 5dafe78c7cb03aae032d49df025f83b29da323ef or a MD5 of d581baefdb4af44ba8a21b1239ee42d7.
GameINI patch
For whatever reason, I couldn't get this to work with [OnLoad], but [OnFrame] does work.  Note that 802F364C needs a comparison because it is overwritten at runtime; if it were replaced every frame nothing interesting would happen.
[OnFrame]
$Rendering debugging (reset to use)
0x802A6740:dword:0x80AD8004
0x802A6744:dword:0x38850004
0x802A6748:dword:0x548406FE
0x802A674C:dword:0x908D8004
0x802A6750:dword:0x3C80802A
0x802A6754:dword:0x7C842A14
0x802A6758:dword:0x80A46770
0x802A675C:dword:0x3C80802F
0x802A6760:dword:0x90A4364C
0x802A6764:dword:0x3884364C
0x802A6768:dword:0x7C0027AC
0x802A676C:dword:0x48000024
0x802A6770:dword:0x54990DFC
0x802A6774:dword:0x54B93032
0x802A6778:dword:0x7CD93378
0x802A677C:dword:0x54F916BA
0x802A6780:dword:0x54F93C68
0x802A6784:dword:0x55193830
0x802A6788:dword:0x39200000
0x802A678C:dword:0x4800023C
0x802A6790:dword:0x3BC00009
0x802F363C:dword:0x9421FFB8
0x802F3640:dword:0xBF010020
0x802F3648:dword:0x3B200000
0x802F364C:dword:0x60000000:0x39000000
0x802F3660:dword:0x3C608058
0x802F3664:dword:0x83038A04
0x802F3680:dword:0x7C78D050
0x802F3684:dword:0x546307BF
0x802F36B4:dword:0x40820008
0x802F36B8:dword:0x7F7BC878
0x802F37A8:dword:0x7F9CCB78
0x802F37AC:dword:0x387AFFFE
0x802F37B0:dword:0x7C781850
0x802F37B4:dword:0x546307BF
0x802F37B8:dword:0x40820008
0x802F37BC:dword:0x7F9CC878
0x802F3888:dword:0xBB010020
0x802F388C:dword:0x8001004C
0x802F3890:dword:0x38210048
[OnFrame_Enabled]
$Rendering debugging (reset to use)


The main result here was that debug cubes only showed up when xfmem.alpha[0].matsource was set to 0 (and when J3DGDSetChanCtrl is completely skipped, as was found before), although there were interesting graphical artifacts with other cases, detailed in the video's description.  This is a very specific result, so it should be usable for further more specific testing.  (To test it with the patch, reset and select a level 3 times, and then die once.)

  
## update4.md

      
    Raw
  

              update4.md
            
          
    The following is a bit disjoint because I tried a few different things, but nothing really seemed to work, and then I got sidetracked with FIFO analyzer improvements.  It's research that was done over several days.  Posting this now so that if I do discover something, it's a bit better organized.

The level in question is actually data/scene/coro_ex0.szs.  It can be extracted by ARCTool, and then scene/map/map/map.bmd can be opened using bmdview2.  Although bmdview2 does not seem to handle the relevant effects completely, the cubes do show up, so if I need to try to investigate the data, that's how I'll do it.  As a similar note, the level can be loaded via Corona Mountain extra 0, in addition to Pianta Village extra 4 and Noki Bay extra 2; I guess there just isn't bounds checking here.  (The levels above are Sirena Beach and Pinna Park, which have level selects that list sub-areas in a clearer fashion, so there are no extras 6 or 8 to try).
I used PR 9334 (conditional breakpoints) to set a breakpoint at 802f379c with condition r27 == 0x681.  It is hit 2 times on level load, and that's it:
Hit 1:

802d7b64 (JSystem::J3DColorBlockLightOff::load)
802db058 (JSystem::J3DMaterial::makeDisplayList)
802de730 (JSystem::J3DModel::makeDL)
8018806c (Map::TJointModel::initActor)
80187f58 (Map::TJointModel::initJointModel)
80194550 (Map::TMapModel::initJointModel)
80188774 (Map::TJointModelManager::initJointModel)
801944d8 (Map::TMapModelManager::init)
801896a8 (Map::TMap::load)
802a07a0 (JDrama::TViewObjPtrListT<THitActor, JDrama::TViewObj>::load)

Hit 2:

802d7b64 (JSystem::J3DColorBlockLightOff::load)
802db058 (JSystem::J3DMaterial::makeDisplayList)
80239e20 (M3DUtil::MActor::initDL)
8023a3cc (M3DUtil::MActor::setModel)
801880a4 (Map::TJointModel::initActor)
80187f58 (Map::TJointModel::initJointModel)
80194550 (Map::TMapModel::initJointModel)
80188774 (Map::TJointModelManager::initJointModel)
801944d8 (Map::TMapModelManager::init)
801896a8 (Map::TMap::load)

I now understand that J3DGDSetChanCtrl is writing into a buffer; sometimes that's only done one (e.g. main stage data), and other times I guess it is done once each frame (e.g. Mario's hat but not body?).  Interesting.
J3DGDSetChanCtrl is also never called with 4 or 5 as a parameter; both J3DColorBlockLightOn and J3DColorBlockLightOff have a 4-count loop that sets all channels, using a stack-allocated copy of an array in memory (JSystem::@329 at 803a9f90) that is copied each iteration of the loop for some reason; the order that is used is 0, 2, 1, 3 (i.e. xfmem.color[0], xfmem.alpha[0], xfmem.color[1], xfmem.alpha[1]).  Were this calling code not so needlessly convoluted, I would have noticed the lack of 4 and 5 and not bothered to implement the special handling of it.  My future tweaks to the patch for Sunshine won't include it.
Looking at how matsource is handled in Dolphin's software renderer, first note that the current debug cube handling happens in SWVertexLoader::ParseColorAttributes which is called by SWVertexLoader::ParseVertex, which is called by SWVertexLoader::DrawCurrentBatch.  That later calls TransformUnit::TransformColor which is what actually implements matsource: for color, it overwrites the R, G, and B bytes (actually implemented by overwriting all of them, but alpha is set later) with those from xfmem.matColor[chan], and for alpha, it overwrites the alpha byte.  Note that this overwriting happens when matsource is false; when it is true it uses the vertex's color.
Based on this, it seems rather obvious that setting xfmem.alpha[0].matsource to false would make the cubes show up: both xfmem.matColors are set to 0xffffffff, so setting it to false simply sets alpha to 0xff (i.e. visible).  BUT, there IS one thing it tells us: when the cubes showed up on console with just alpha[0].matsource = false, they were black.  Doing the same thing in current Dolphin builds also makes them show up, but they have their texture visible.  Furthermore, they didn't show up when color[0].matsource = false, though other platforms did turn white; this does show that alpha defaults to 0 (though perhaps that's obvious from them not showing up normally).  What matters is that this shows that the color the debug cubes have is 0x00000000, instead of the current SW and HW implementations' 0x00ffffff.
Unfortunately, this doesn't explain why the debug cubes have 0 as their color but the bridge or reflection does not.
(Failed) idea regarding diffusefunc
In SMS, Object 35 is the main platforms and Object 36 is the screws in those platforms.  Object 35 has 701, 202, 701, 400 (set in Object 34, see the note about it being unintuitive in the wiki), while Object 36 has 70f, 202, 701, 400 (set in Object 35).  Object 36 matches the cubes (70f, 202, 681, 400, set in Object 247 for Object 248; Object 247 actually sets it multiple times...) apart from xfmem.alpha[0], which the cubes have at SIGN (1) but the screws have at CLAMP (2).  I'm not sure whether the screws have colors set or not, and this is a leap of logic, but it feels like 701 might be the default value used for xfmem.alpha[0], so maybe SIGN vs CLAMP does it?  As far as I can tell, diffusefunc shouldn't matter if no lights are enabled in the mask, so it being changed seems suspicious.
802f3634 2c 08 00 01     cmpwi      r8,0x1          ; check if diffusefunc is SIGN (1)
802f3648 40 82 00 08     bne        LAB_802f3650    ; skip next instruction if not (unchanged)
802f364c 39 00 00 02     li         r8,0x2          ; set diffusefunc to CLAMP (2)

The actual changes (also including level select) are:
802a6788/002c16c8: 3bc00009
802f3634/0030e574: 2c080001
802f364c/0030e58c: 39000002

and a disc image with SHA-1 9fe3d455127e61afb0e755df3422441a58bdff77 or MD5 7db014cb1bc8e78bbb15635777683299.
... and nothing; I don't see any rendering changes on console at all.  Well, it was worth a shot.

One thing that seems odd is that the cubes are slightly transparent depending on the angle you look at them from (and more transparent the farther you are).  This can be seen in both Dolphin (particularly with freelook, but it's not required) (image 1 (enlarged), image 2 (enlarged)) and with bmdview2.  It can also be seen on console with the black cubes (screenshot from previous video; I did some further testing and confirmed that it is definitely transparency as I could see the rotating platform behind it).  This happens even if diffusefunc is forced to 0 (802f3634 set to 39000000).
The bmdview source code mentions https://kuribo64.net/board/thread.php?id=532#14984
Monkey Ball sets 100c to 0xffffffff and 100a to 7f7299ff in object 607.  I don't see where it sets 100d or 100b.
One thing to note is that the cubes are BLACK when only the alpha matsource is disabled.  Not white, as in Dolphin.
One of the blue platforms at the start doesn't even follow the cubes.

New idea: cp vtxspecs vs xf vtxspecs, mismatched num colors.
After testing, it looks like they match, so that's not it:

SMS has color 0 = not present in VCD_LO, and num colors = 0 in XFMEM_VTXSPECS; it also has 1 normal and 1 tex coord (in both), in Object 247.  Position, normal, and tex coord are 16-bit indexes.
Monkey ball: color 0 = not present in VCD_LO, and num colors = 0 in XFMEM_VTXSPECS.  1 normal and 0 tex coords, in object 616.  Position and normal are direct.
Mark Kart Wii:  1 normal and 3 tex coords, 0 colors, 8 bit indexes for all (object 30).
Monster Jam:  1 normal and 1 tex coord, 16-bit indexes.  No colors.  Looking in object 66 here but there are a bunch of different objects that don't render.

OK, what about the VAT one?

SMS's CP_VAT_REG_A - Format 0:

Color 0 elements: 4 (r, g, b, a) (1)
Color 0 format: RGBA 32 bits 8888 (5)
Color 1 elements: 4 (r, g, b, a) (1)
Color 1 format: RGBA 32 bits 8888 (5)


Monkey Ball: I couldn't find where it set CP_VAT_REG_A; it probably happens in a much earlier object.  I'll return to that later.
Mario Kart Wii's CP_VAT_REG_A - Format 0:

Color 0 elements: 4 (r, g, b, a) (1)
Color 0 format: RGBA 32 bits 8888 (5)
Color 1 elements: 4 (r, g, b, a) (1)
Color 1 format: RGBA 32 bits 8888 (5)


Monster Jam: CP_VAT_REG_A - Format 7:

Color 0 elements: 4 (r, g, b, a) (1)
Color 0 format: RGBA 32 bits 8888 (5)
Color 1 elements: 3 (r, g, b) (0)
Color 1 format: RGB 16 bits 565 (0)

This doesn't seem relevant either.
There's also the BP genmode:

SMS has Num color channels: 1
Monkey Ball has Num color channels: 1
Mark Kart Wii has Num color channels: 1
Monster Jam has Num color channels: 1

OK, that's not it either.

  
## update5.md

      
    Raw
  

              update5.md
            
          
    So, in summary, here's what I learned.

On console, if xfmem.alpha[0].matsource is set to MatSource::MatColorRegister (0), then the debug cubes show up pitch black.  If xfmem.color[0].matsource is set to MatSource::MatColorRegister, they still don't show up.  The material color is set to 0xFFFFFFFF.  This indicates that the true color of the debug cubes by the time matsource is applied is 0x00000000 (transparent black), not 0x00FFFFFF (transparent white) as currently implemented in Dolphin.  This is the only thing that's really actionable, and should be enough to make a hackfix PR.
None of the other values in xfmem.alpha or xfmem.color seem to do anything.
VCD_LO, XFMEM_VTXSPECS, CP_VAT_REG_A, and BP_GENMODE all seem to be consistent.
The debug cubes are slightly transparent (on console, in dolphin, and in bmdview2), and seem to be more transparent depending on your distance and angle.  Also, the transparent rotating cubes have render-order issues with them (I only discovered this now).
One of the sets of debug cubes doesn't line up with the actual path the moving platform takes.
Apparently I actually do have the skills to make ASM patches, even if I did overcomplicate things beyond what was actually needed for testing.

I also have a video that demonstrates the black cubes on console (including their transparency).

  
## update6.md

      
    Raw
  

              update6.md
            
          
    Actually, the transparency does give me one more thing to look at - alpha fog!
Well, actually, alpha fog doesn't exist; fog only seems to affect the RGB components.  But it's definitely something very fog-like...  The only thing I can think of is blending.  And the relevant blend commands are interesting:
00000fec:  BP  41 0054a9   BPMEM_BLENDMODE
....
00001195:  BP  fe 001fe3   BPMEM_BP_MASK
0000119a:  BP  41 0034a1   BPMEM_BLENDMODE


BP register BPMEM_BLENDMODE
Blend mode: Enable: Yes
Logic ops: No
Dither: No
Color write: Yes
Alpha write: No
Dest factor: 1-src_alpha (5)
Source factor: src_alpha (4)
Subtract: No
Logic mode: d (5)
BP register BPMEM_BP_MASK
The next BP command will only update these bits; others will retain their prior values: 001fe3
BP register BPMEM_BLENDMODE
Blend mode: Enable: Yes
Logic ops: No
Dither: No
Color write: No
Alpha write: No
Dest factor: 1-src_alpha (5)
Source factor: src_alpha (4)
Subtract: No
Logic mode: s (3)


There are a ton of earlier blend commands (and other commands) early on that get overwritten later, but these two both matter because BPMEM_BP_MASK is used here.  And strangely, the mask used doesn't actually cover all of the bits set; the 4 bits x in 00x000 are the logic op/logic mode, and yet the mask is only setting one field.  So instead of the second command changing the value from 5 to 3, it changes it to (5 & ~1) | (3 & 1) = 4 | 1 = 5.  Which, well is the same as a proper mask that leaves it unchanged, but it still seems weird here.  (Note that the FIFO analyzer doesn't handle the masking directly even with my changes.  I'd assumed that no command would try to set bits not in the mask, and that if the mask did matter in a case, it'd be semi-obvious what bits were being updated.)  Logic ops aren't even enabled in either case, so it really doesn't make sense at all.  Here's the actual new value taking the mask into consideration:
BP  41 0054a1   BPMEM_BLENDMODE

BP register BPMEM_BLENDMODE
Blend mode: Enable: Yes
Logic ops: No
Dither: No
Color write: No
Alpha write: No
Dest factor: 1-src_alpha (5)
Source factor: src_alpha (4)
Subtract: No
Logic mode: d (5)

Using Visual Studio's debugger and the software renderer, and with only object 248 enabled in the FIFO player, EfbInterface::BlendColor always satisfies srcClr[0] == 0, srcClr[1] == srcClr[2], srcClr[1] == srcClr[3], srcClr[1] >= 0x8c, and srcClr[1] <= 0xb2, i.e. grays between #8c8c8c and #b2b2b2 (inclusive), which is consistent with what we see ingame.  Since srcClr[0] is 0, srcFactor is 0 and dstFactor is 0xffffffff, the destination color remains unchanged.  With only object 248 enabled, the rendered screen remains black since nothing else is drawn, so all elements of dstClr are 0.
Of course, this is with #9122, which made the debug cubes invisible by setting their alpha to 0.  Reverting that and using 0xffffffff as the default color, the alpha channel srcClr[0] does vary, and I've seen values from 0 to 0xff.  Clearly, the variance isn't actually coming from the that function.
BlendColor is called by BlendTev which is called by Tev::Draw.  The same conditions I found for srcClr apply to output by the end of that function.  bpmem.ztex2.op is ZTexOp::Disabled; bpmem.fog.c_proj_fsel.fsel is FogType::Off; and late_ztest is false (as bpmem.zcontrol.early_ztest is true).  Furthermore numindstages and numtevstages are both 0 — but the loop over numtevstages uses <= (while numindstages uses <).  So, that loop is the only place where things could change in Tev::Draw.

Object 248 sets up a LOT of different TEV registers, the majority of which aren't used.  Here's a shortened list, skipping a TON of earlier updates to things we don't care about (some of which are set to the same value several times):
000010b4:  BP  28 387040   BPMEM_TREF
...
0000113b:  BP  c0 08f8af   BPMEM_TEV_COLOR_ENV Tev stage 0
00001140:  BP  c1 08f2f0   BPMEM_TEV_ALPHA_ENV Tev stage 0
00001145:  BP  10 000000   BPMEM_IND_CMD command 0
0000114a:  BP  f6 e338c4   BPMEM_TEV_KSEL number 0
0000114a:  BP  f7 e338ce   BPMEM_TEV_KSEL number 0


BP register BPMEM_TREF number 0
Configuration: Stage 0 texmap: 0
Stage 0 tex coord: 0
Stage 0 enable texmap: Yes
Stage 0 color channel: Color chan 0 (0)
Stage 1 texmap: 7
Stage 1 tex coord: 0
Stage 1 enable texmap: No
Stage 1 color channel: Zero (7)
BP register BPMEM_TEV_COLOR_ENV Tev stage 0
Tev stage: 0
a: ZERO (15)
b: tex.rgb (8)
c: ras.rgb (10)
d: ZERO (15)
Bias: 0 (0)
Op: Add (0) / Comparison: Greater than (0)
Clamp: Yes
Scale factor: 1 (0) / Compare mode: R8 (0)
Dest: prev (0)
BP register BPMEM_TEV_ALPHA_ENV Tev stage 0
Tev stage: 0
a: ZERO (7)
b: tex (4)
c: ras (5)
d: ZERO (7)
Bias: 0 (0)
Op: Add (0) / Comparison: Greater than (0)
Clamp: Yes
Scale factor: 1 (0) / Compare mode: R8 (0)
Dest: prev (0)
Ras sel: 0
Tex sel: 0
BP register BPMEM_IND_CMD command 0
Indirect stage: Indirect tex stage ID: 0
Format: ITF_8 (0)
Bias: None (0)
Bump alpha: Off (0)
Offset matrix ID: 0
Regular coord S wrapping factor: Off (0)
Regular coord T wrapping factor: Off (0)
Use modified texture coordinates for LOD computation: No
Add texture coordinates from previous TEV stage: No
BP register BPMEM_TEV_KSEL number 0
Number 0: Swap 1: 0
Swap 2: 1
Color sel 0: Konst 0 (color only) (12)
Alpha sel 0: Konst 0 Alpha (28)
Color sel 1: Konst 0 (color only) (12)
Alpha sel 1: Konst 0 Alpha (28)
BP register BPMEM_TEV_KSEL number 1
Number 1: Swap 1: 2
Swap 2: 3
Color sel 0: Konst 0 (color only) (12)
Alpha sel 0: Konst 0 Alpha (28)
Color sel 1: Konst 0 (color only) (12)
Alpha sel 1: Konst 0 Alpha (28)


Walking through the relevant code, the texture is first sampled, and then swizzled trivially (mapping RGBA to RGBA).  The konst values are computed, with m_KonstLUT[12] == 0x00ffffff and m_ConstLUT[28] == 0xffffffff, and thus StageKonst = 0xffffffff.  SetRasColor is then called... but actually, this isn't relevant.
It turns out that TexColor also has varying alpha values (with values from 0 through 0xff observed).  So TextureSampler::Sample is producing varying alpha.

Alright, values used by TextureSampler.
000010a0:  BP  88 30fc3f   BPMEM_TX_SETIMAGE0 Texture Unit 0
000010a5:  BP  94 05335b   BPMEM_TX_SETIMAGE3 Texture Unit 0
000010aa:  BP  80 01c1d5   BPMEM_TX_SETMODE0 Texture Unit 0
000010af:  BP  84 003000   BPMEM_TX_SETMODE1 Texture Unit 0


BP register BPMEM_TX_SETIMAGE0 Texture Unit 0
Texture Unit: 0
Width: 64
Height: 64
Format: IA8 (3)
BP register BPMEM_TX_SETIMAGE3 Texture Unit 0
Texture Unit: 0
Source address (32 byte aligned): 0xA66B60
BP register BPMEM_TX_SETMODE0 Texture Unit 0
Texture Unit: 0
Wrap S: Repeat (1)
Wrap T: Repeat (1)
Mag filter: Linear (1)
Mip mode: Mip linear (2)
Min filter: Linear (1)
LOD type: Diagonal LOD (1)
LOD bias: -32 (-1)
Max aniso: 1 (0)
LOD/bias clamp: No
BP register BPMEM_TX_SETMODE1 Texture Unit 0
Texture Unit: 0
Min LOD: 0 (0)
Max LOD: 48 (3)


Looking at the texture itself at address 80A66B60, the texture data is composed of ff ff repeated for 256 bytes, followed by ff c9 for 256 bytes, 8 times (so 0x1000 bytes like that), then ff c9 for 256 bytes then ff ff for 256 bytes, again 8 times (so 0x1000 bytes like that) for 0x2000 total bytes (64×64×2).  It seems reasonable to assume that this produces the checkerboard pattern with 0xffffffff and 0xffc9c9c9 as the colors.
TextureLod[0] appears to be in the range of 0 through 0x30 (inclusive), and TextureLinear[0] is always true.
In TextureSampler::Sample, if I define ALLOW_MIPMAP to be 0 (and comment out the now-unused declaration of mipLinear), the alpha value in Tev::Draw is always 0xff, and the colors are all between 0xc9 and 0xff (inclusive).  So it's the mipmapping that's responsible for the alpha behavior, even though the texture itself has no transparency, oddly.  This is confirmed by dumping textures; tex1_64x64_m_6372ad0caa2e04f8_3_arb.png is 64x64 with #ffffffff and #ffc9c9c9, while tex1_64x64_m_6372ad0caa2e04f8_3_arb_mip1.png has an alpha of 179 and is 32x32 with #b3ffffff and #b3c9c9c9; tex1_64x64_m_6372ad0caa2e04f8_3_arb_mip2.png has an alpha of 102 and is 16x16 with #66ffffff and #66c9c9c9; and tex1_64x64_m_6372ad0caa2e04f8_3_arb_mip3.png has an alpha of 0 and is 8x8 with #00ffffff and #00c9c9c9.
I don't fully understand why mipmapping is doing this to the alpha, but actually, since the cubes were still doing this on console when I forced the alpha channel to 0xff (but they were black since the color channels were still 0), the mipmapping aspect probably isn't the cause of them being invisible either.  In other words, since mipmapping still affected the alpha the same way on console with the other channels left as black, mipmapping is probably not responsible for the color being transparent black in the first place.

Later edit, with hardware testing: GXInitTexObjLOD is a function that sets TexMode0.  Interestingly, min_filter is one 3-bit variable there, as it previously was in Dolphin.  This uses a lookup table called gx::GX2HWFiltConv to convert between its min_filter parameter and the actual one on hardware; the contents of this table is 0 4 1 5 2 6 0 0, or the field I now call min_filter is set if the parameter's lowest bit is set, and the field I now call min_mip (but maybe should be mip_filter) is set to the parameter's upper 2 bits.  Modifying this table to be 0 4 0 4 0 4 0 0 will make the mipmap filter always use MipMode::None.  Here's the patch:
8040cee0/00405420: 0004000400040000
802a6788/002c16c8: 3bc00009

The modified disc image has a sha-1 of e28445655ab0eeaa5f2039e385e3332e7f5d0d5a or a md5 of 0d757edd370cb40d132ee6908bd7a991.  The result of this is that the cubes don't show up on console or on dolphin, and when dolphin is tweaked to show the cubes, they still have transparency.  Conclusion: I need to patch JSystem::GX2HWFiltConv for JSystem::loadTexNo instead.  New patch:
8040cc70/004051b0: 0004000400040000
802a6788/002c16c8: 3bc00009

This new disc image has a sha-1 of 16d9b163a11f46f1c58b4833bada55e9688496a2 and a md5 of 5335bbb7c343103fdddd0534699df6ae.  Also, it's very obvious that mipmapping is disabled as water now looks extra sparkly.  The cubes still don't show up in dolphin or on console, but if dolphin is tweaked to show them, they no longer seem to be transparent.  The render-order issues with the rotating cube still apply, oddly.  Still, this shows that mipmapping is not the cause of them being invisible.
Adding a patch to J3DGDSetChanCtrl as well (just forcing matsource to 0 for everything):
802f3648/0030E588: 6000000038c00000

The new disc image has a sha1 of 6202df98b4ba7e3b2cf79f7c5bafeb9da523a1fc and a md5 of 9a2b69d6290f2f36233d7ae1c6ef9815.  With these changes the cubes show up (with gray, not black) on vanilla dolphin and on console, and still do not appear to be transparent (but still have the render order issues).  This confirms that mipmapping is the cause of them being transparent at further distances.

  
## update7.md

      
    Raw
  

              update7.md
            
          
    A few weeks ago JMC mentioned that a brawl hack had the same issue, this time affecting hands on the character selection screen (fifo).  This additional data point is nice because it's a fairly simple scene, and the hands are 2D.
The player 1 hand is object 228, using this PR which splits objects on EFB copies (I think it's 226 without it).  The other hands are 233, 238, and 243.  The channel 0 material color for player 1 is ffd9d7ff, the color config is 702 (lighting enabled, material source is material color register), and the alpha config is 701 (lighting disabled, material source from vertex).  There is no color data present.  This results in dolphin using 0 for the hand's alpha, when the mod expects ff.  (If they had enabled use of the material color register for alpha, everything would work correctly).
Of note is that brawl has BPMEM_ZMODE enable updates disabled and BPMEM_ZCOMPARE early depth test enabled (this is set in object 86).  Those two values match sunshine, but every other game had early updates enabled and early depth test disabled.
I also managed to get the hardware fifoplayer to run.  There were a few challenges.  For one, it only works with fifologs with a version set to 1 (if it's set to 2 or higher, it tries to use a different format that doesn't match the actual changes from version 2, as far as I can tell, but hexediting the start of the fifolog seems to usually work correctly).  Additionally, it only works if an SD card is inserted.  But most annoyingly, it requires a network connection to work, and at least on my Wii that only seems to happen 25% of the time (not counting the many times where wiiload failed or the homebrew channel didn't initialize the network properly).  If it exits soon after starting, it's probably a network failure.  Also note that it displays whatever was previously in memory until you load a fifolog over the network using the provided tool; this might be green garbage data, or some bootmii textures, or a previous fifolog.
The hardware fifo player is also neat in that you can edit the fifolog over the network and it will apply your changes immediately.  Simply double-click the hex code, and then click off of it afterwards to send your changes.  Some BP registers also have a UI for editing.  But be warned that if you manually edit a BP register, there is a space before the hex that MUST be deleted or else it won't save correctly.  Also note that it uses the old object numbering scheme where primitive data is at the start of an object, so when toggling primitives, select the object after the one you'd expect (this doesn't count different numbering due to splitting on EFBs).  Oh, and also, you will need to disable primitives for some objects in some cases as there are vertex explosions otherwise; this affects the Mario model in Super Mario Sunshine.  And some games just don't work at all, not sure why.
Using the hardware fifo player I did confirm that the debug cubes don't show up, even when only that object is enabled, but they do still show up if forced to use the material source register for both color and alpha.  I also confirmed that the trucks in monster jam and the hand in the smash mod do show up if they're the only things enabled.  So the cubes not showing up isn't related to the colors of previously rendered vertices, at least probably.
I also tested whether the clear color for the previous EFB copy mattered; Sunshine clears with 00000000 while Monster Jam uses ff000000 (alpha = ff) and Smash uses ffffffff.  However, changing these values for Smash didn't effect whether the hands rendered, so they are also probably not relevant.

  
## update8.md

      
    Raw
  

              update8.md
            
          
    A typo while working on the hackfix resulted in some interesting fifoci differences.  The typo was that I was dividing by 256 instead of 255, which meant that default colors were slightly wrong (presumably 0xFEFEFEFE instead of 0xFFFFFFFF, though I'm not 100% sure of this).  What's interesting is that this affected more games than I expected it would; two custom brawl characters (this and this) were affected, as well as Tony Hawk and depth tests in a spiderman game (though that last one seems to be more a timing issue that results in garbage data; I'm not entirely sure what's going on with it).
For Lloyd, the first affected object seems to be number 359.  Despite the alpha material source being the vertex, it doesn't seem like alpha being 0 does anything (possibly the game has blending set up in a different way), but the actual color IS wrong.  It's worth noting that the entire problem that led to the fifolog being on fifo.ci is that the character is too bright.  Dolphin's implementation has always had RGB be #FFFFFF; the past attempted debug cube fix only changed alpha.  Setting it to black (MissingColorValue = 0x00000000 under [Hacks] in GFX.ini) confirms that the color is the same undefined one, and neither #FFFFFF nor #000000 match.  #808080 seems closer to correct, but I don't know if that's actually it.
Blaziken seems to suffer from the same problem.  Unfortunately its issue doesn't have a screenshot of the expected behavior, but #808080 looks nicer.  That one was created because an early debug cube fix, and indeed, starting at object 83, the same thing seems to apply.
Apparently Tony Hawk did have a similar invisible characters issue in the past.  I don't know if that actually still applies.
(Object numbers are without splitting on EFB copies.)
Field	Mario Sunshine	Mario Kart	Monkey Ball
Hex	70f	702	706
matsource	true	false	false
enablelighting	true	true	true
lightMask0_3	3	0	1
ambsource	false	false	false
diffusefunc	CLAMP (2)	CLAMP (2)	CLAMP (2)
attnfunc	SPOT (3)	SPOT (3)	SPOT (3)
lightMask4_7	0	0	0
	diff --git a/Source/Core/VideoBackends/Software/SWVertexLoader.cpp b/Source/Core/VideoBackends/Software/SWVertexLoader.cpp
	index 380d6dda0d..95c026c8d7 100644
	--- a/Source/Core/VideoBackends/Software/SWVertexLoader.cpp
	+++ b/Source/Core/VideoBackends/Software/SWVertexLoader.cpp
	@@ -186,10 +186,7 @@ static void ParseColorAttributes(InputVertexData* dst, DataReader& src,
	const PortableVertexDeclaration& vdec)
	{
	const auto set_default_color = [](u8* color, int i) {
	- // The default alpha channel seems to depend on the number of components in the vertex format.
	- const auto& g0 = g_main_cp_state.vtx_attr[g_main_cp_state.last_id].g0;
	- const u32 color_elements = i == 0 ? g0.Color0Elements : g0.Color1Elements;
	- color[0] = color_elements == 0 ? 255 : 0;
	+ color[0] = (xfmem.color[0].enablelighting && xfmem.color[1].enablelighting) ? 0 : 255;
	color[1] = 255;
	color[2] = 255;
	color[3] = 255;