The fifolog in question is ss-map, downloadable here. I will be discussing only the green part of the map, but all of the main parts of the map render the same way (with the main configuration happening in object 1). The objects in question are: 1: Brown (pathways); 2: dark brown (rocks?); 3: yellow (sand); 4: green (grass); 6: dark green (unknown, not much uses this); 6: grey (stone); 7: red (roof); 8: brown (roof). Object 9, water, has its own configuration which I haven't investigated, but the water seems to all be at the same elevation anyways.
The border shadows around the map are also dynamically generated; they use EFB copies. The relevant objects are currently numbered 16 and 17 as of Dolphin 5.0-14344, and their vertex data is at offsets 00047466 and 000476d5 respectively.
Matrix configuration (00000463):
XF register XFMEM_SETMATRIXINDA Matrix index A: PosNormal: 0 Tex0: 30 Tex1: 33 Tex2: 36 Tex3: 39
Texgen 0 configuration (00000439):
XF register XFMEM_SETTEXMTXINFO Matrix 0 Projection: STQ (3x4 matrix) (1) Input form: ABC1 (1) Tex gen type: Regular (0) Source row: Geometry (input is ABC1) (0) Emboss source shift: 0 Emboss light shift: 0
Matrix values for texgen 0 (00000283):
XF register Write 12 XF mem words at 0078 Position matrix row 30 col 0 = 0.0008333333 Position matrix row 30 col 1 = 0 Position matrix row 30 col 2 = 0 Position matrix row 30 col 3 = 0.5 Position matrix row 31 col 0 = 0 Position matrix row 31 col 1 = 0 Position matrix row 31 col 2 = -0.00083333324 Position matrix row 31 col 3 = 0.5 Position matrix row 32 col 0 = 0 Position matrix row 32 col 1 = 0 Position matrix row 32 col 2 = 0 Position matrix row 32 col 3 = 1
Texgen 1 configuration (0000044b):
XF register XFMEM_SETTEXMTXINFO Matrix 1 Projection: ST (2x4 matrix) (0) Input form: ABC1 (1) Tex gen type: Regular (0) Source row: Geometry (input is ABC1) (0) Emboss source shift: 0 Emboss light shift: 0
Matrix values for texgen 1 (000002b8):
XF register Write 8 XF mem words at 0084 Position matrix row 33 col 0 = 0 Position matrix row 33 col 1 = 0.0009090909 Position matrix row 33 col 2 = 0 Position matrix row 33 col 3 = 0.18181819 Position matrix row 34 col 0 = 0 Position matrix row 34 col 1 = 0 Position matrix row 34 col 2 = 0 Position matrix row 34 col 3 = 0
The input model has y as the depth axis, while the output one has z as the depth axis; the input model is also vertically flipped compared to the output. Both models are provided below (SkywardSwordMap_in.stl and SkywardSwordMap_out.stl); note that the out model also needed to have triangles reversed to render properly. The input coordinates range from (-8802.63184, -3711.02588, -22749.94141) to (14476.52441, 4380.22363, 6468.75635); the output coordinates range from (-0.33568, -0.74345, 0.15503) to (0.39234, 0.8837, 0.63101).
The main thing to note is that the model for the map does not contain any texture coordinates; it only contains positions. The transform unit (XF) actually generates the texture coordinates based on the position data. Texture coordinate 0 is based on the x and z coordinates of the (input) model, while texture coordinate 1 is based on the y value. Specifically, texture coordinate 0 is
(x/1200 + .5, -z/1200 + .5, 1), and texture coordinate 1 is
(y/1100 + 2/11, 0); texture coordinate 0 ranges from about (-6.84, -4.89) to (12.56, 19.45), while texture coordinate 1 ranges from about (-3.19, 0) to (4.16, 0). Texture coordinate 1 is not wrapped (I'll explain why when I get to what it's used for below), so in terms of sampling the texture it ranges from (0, 0) to (1, 0). SkywardSwordMap_out_texcoord.stl shows the map with texture coordinate 1 (clamped to the range of 0 – 1) replacing the z coordinate.
I'm guessing the developers went for this approach to make it easier to edit the map; they could force either the darkest or lightest value by putting map geometry at a high or low position, but were able to easily generate gradients without needing to manually apply texture coordinates.
TEV stage 0
Indirect stage configuration (000002fb):
BP register BPMEM_IREF Stage 0 ntexmap: 1 Stage 0 ntexcoord: 1 ...
TEV stage 0 color configuration (00000337), just directly use texture color:
BP register BPMEM_TEV_COLOR_ENV Tev stage 0 dest.rgb = tex.rgb a: ZERO (15) b: ZERO (15) c: ZERO (15) d: tex.rgb (8) Bias: 0 (0) Op: Add (0) / Comparison: Greater than (0) Clamp: Yes Scale factor: 1 (0) / Compare mode: R8 (0) Dest: prev (0)
TEV stage 0/1 texture configuration (00000355):
BP register BPMEM_TREF number 0 Stage 0 texmap: 0 Stage 0 tex coord: 0 Stage 0 enable texmap: Yes Stage 0 color channel: Zero (7) ...
TEV stage 0 indirect configuration (0000032d):
BP register BPMEM_IND_CMD command 0 Indirect tex stage ID: 0 Format: ITF_3 (3) Bias: None (0) Bump alpha: Off (0) Offset matrix index: Matrix 0 (1) Offset matrix ID: Indirect (0) Regular coord S wrapping factor: 32 (4) Regular coord T wrapping factor: 32 (4) Use modified texture coordinates for LOD computation: Yes Add texture coordinates from previous TEV stage: No
Indirect matrix (0000031e) - only mc is nonzero, with an effective value of 0.03125 × 23+8+16 - 17 = 0.03125 × 210 = 32/1024 × 1024 = 32.
BP register BPMEM_IND_MTXA Matrix 0 Matrix 0 column 0 (A) Row 0 (ma): 0 (0) Row 1 (mb): 0 (0) Scale bits: 3 (shifted: 3)
BP register BPMEM_IND_MTXB Matrix 0 Matrix 0 column 1 (B) Row 0 (mc): 0 (0) Row 1 (md): 0.03125 (32) Scale bits: 2 (shifted: 8)
BP register BPMEM_IND_MTXC Matrix 0 Matrix 0 column 2 (C) Row 0 (me): 0 (0) Row 1 (mf): 0 (0) Scale bits: 1 (shifted: 16)
There are two textures in use: a 32 by 128 grass texture (texture 0) and a 96 by 4 mostly transparent texture (texture 1). Texture 0 is composed of 4 different 32x32 grass textures of varying brightness stacked vertically. On the other hand texture 1 is composed of three 32 by 4 sections where the alpha value increases within each section (starting from 0 afterwards), and the R/G/B values are the same within a section but increase between sections. (I'll clarify the exact values in use later.)
The indirect stage has wrapping enabled (with the wrapping factor set to 32). This means that texture coordinate 0 wraps around in the range of 0-32. (Texture coordinate 0 was also multiplied by 31 elsewhere in the process, so this wrapping happens a reasonable number of times.) This ensures that it sticks within a single 32 by 32 box in the grass texture. However, on its own, this would just mean that it always uses the top, darkest texture.
The indirect texture itself is what's used to choose which section to render. Depending on the value of texture coordinate 1, the T coord generated from the B channel of the indirect texture will be 0, 1, or 2. This is multiplied with the indirect matrix to produce a value of 0, 32, or 64 (and also to zero out the S and U coordinates). That value is then added to the wrapped regular coordinate, meaning that depending on texture coordinate 1, it will draw a part of the 32 by 128 texture between (0, 0)-(31, 31), (0, 32)-(31, 63), or (0, 64)-(31, 95).
TEV stage 1
TEV stage 1 color configuration (0000035f):
BP register BPMEM_TEV_COLOR_ENV Tev stage 1 dest.rgb = (1 - ras.aaa)*prev.rgb + ras.aaa*tex.rgb a: prev.rgb (0) b: tex.rgb (8) c: ras.aaa (11) d: ZERO (15) Bias: 0 (0) Op: Add (0) / Comparison: Greater than (0) Clamp: Yes Scale factor: 1 (0) / Compare mode: R8 (0) Dest: prev (0)
TEV stage 0/1 texture configuration (00000355):
BP register BPMEM_TREF number 0 ... Stage 1 texmap: 0 Stage 1 tex coord: 0 Stage 1 enable texmap: Yes Stage 1 color channel: Norm alpha bump (6)
TEV stage 1 indirect configuration (00000350):
BP register BPMEM_IND_CMD command 1 Indirect tex stage ID: 0 Format: ITF_3 (3) Bias: T (2) Bump alpha: S (1) Offset matrix index: Matrix 0 (1) Offset matrix ID: Indirect (0) Regular coord S wrapping factor: 32 (4) Regular coord T wrapping factor: 32 (4) Use modified texture coordinates for LOD computation: Yes Add texture coordinates from previous TEV stage: No
I've skipped some data that was already shown in TEV stage 1.
Now, that might seem suspicious because the region of texture 0 from (0, 96)-(31, 127) isn't used. There's also a second problem: directly jumping between different brightnesses would be jarring and wouldn't convey slopes well. TEV stage 1 is configured to solve both of those problems.
Texture 0 is sampled in basically the same was as in TEV stage 0; however, the indirect stage has bias enabled for the T coordinate. In this case, the bias function adds 1 to T, so instead of 0/1/2 it is set to 1/2/3 (which becomes 32/64/96 when multiplied by the indirect matrix). This means that the texture value for TEV stage 1 is one "unit" brigher than in TEV stage 0.
TEV stage 1 then blends its texture value with the one from the previous stage. To do this, it uses a feature referred to as "bump alpha". (I'm not sure why it's called "bump alpha", as although in this case the value comes from the alpha value of the texture, it also can come from the blue or green channels; it also doesn't need to be used for transparency and in this case it's definitely not used for bumping.) Here, it takes the S coordinate from the indirect texture (i.e. the alpha channel in texture 1). It is then normalized into the range of 0-255, passed to the rasterizer, and used for blending. Since texture 1's alpha value increases within each section, this means that values closer to the start of the section (right after it changed from a different texture) have small values for the bump alpha while ones near the end of the section (right before it changes to a different texture) have large values. This value is then used to lerp the TEV stage 0 and stage 1 texture values with an equation along the lines of
(1 - bump_alpha)*stage_0_tex + bump_alpha*stage_1_tex, producing a smooth transition.
TEV stage 2
TEV stage 2 color configuration (0000037d):
BP register BPMEM_TEV_COLOR_ENV Tev stage 2 dest.rgb = c0.rgb*prev.rgb a: ZERO (15) b: prev.rgb (0) c: c0.rgb (2) d: ZERO (15) Bias: 0 (0) Op: Add (0) / Comparison: Greater than (0) Clamp: Yes Scale factor: 1 (0) / Compare mode: R8 (0) Dest: prev (0)
TEV stage 2 indirect configuration (00000373), set to 0:
BP register BPMEM_IND_CMD command 2 Indirect tex stage ID: 0 Format: ITF_8 (0) Bias: None (0) Bump alpha: Off (0) Offset matrix index: Off (0) Offset matrix ID: Indirect (0) Regular coord S wrapping factor: Off (0) Regular coord T wrapping factor: Off (0) Use modified texture coordinates for LOD computation: No Add texture coordinates from previous TEV stage: No
TEV color 0 RA and BG (000003be) (BG is repeated 3 times, which libogc claims is to flush the write gather pipe):
BP register BPMEM_TEV_COLOR_RA Tev register 1 Type: Color (0) Alpha: 0ff Red: 0ff
BP register BPMEM_TEV_COLOR_BG Tev register 1 Type: Color (0) Green: 0ff Blue: 0ff
Stage 2 is used to tint the result of the previous stages. However, the tint color is always set to white, so this doesn't actually do anything. But the functionality does exist.
Implementation details and what went wrong
Dolphin already had all of this implemented before. So why didn't it work?
Because the way that it was getting data from the indirect texture was incorrect.
Nearly everything that uses indirect textures uses the ITF_8 indirect format. The indirect format is separate from the texture format: the texture format indicates how to convert bytes into RGBA colors, while the indirect format indicates what bits in those colors are used for indirect offsetting, and what bits are used for bump alpha. ITF_8 indicates that all bits are used for both indirect offsets and bump alpha, which is great if you're only using indirect offsets, and still fine if you're not since you get 3 color values to work with (S, T, and U texture coordinates from the alpha, blue, and green channels). However, ITF_8 also causes the bias function to subtract 128, instead of adding 1. Although this is useful in some cases, it wouldn't work here. And, bump alpha is a 5-bit value, so it's not particularly useful to have more bits for it. The other formats (ITF_5, ITF_4, and ITF_3) let you use 5, 4, or 3 bits from a color channel for the indirect offset while leaving the remaining 3, 4, or 5 bits for the bump alpha value.
Dolphin's problem? It used the wrong bits for those modes. As documented in libogc, "Bits for the indirect offsets are extracted from the high end of each component byte. Bits for the bump alpha are extraced off the low end of the byte." Dolphin had it backwards: it used the upper bits for bump alpha, and the lower bits for indirect offsets.
To give some actual numbers: the indirect texture (texture 1)'s alpha channel values went from 0 to 31 in each section (so 0 to 31, repeated 3 times), while the blue values were 0, 32, and 64 in each section respectively. With the indirect format used here, ITF_3, that meant that the bits for bump alpha values ranged from 0 to 31, and for the indirect offset the value is 0, 1, or 2. As a 5-bit field, bump alpha actually is shifted right by 3, giving a value of 0 to 240 (this is probably how Dolphin got it reversed, as in ITF_8 it's implemented by just masking with 240), and can optionally be normalized by re-using the top 3 bits as the bottom ones (which gives a range of 0 to 255 instead; it's analogous to changing 0, 10, 20, ..., 90, 100, 110, ..., 980, 990 into 0, 10, 20, ..., 90, 101, 111, ..., 989, 999). In this case, the incorrect implementation of ITF_3 meant that the indirect offset was always 0, while the bump alpha was 0 to 3 (0 to 24). This did mean that the map was not a solid color, but the color differences were extremely subtle.
Other possible implementations
To finish things off, there were a few other ways this could have been implemented.
- Instead of using ITF_3 and the bump function, they could have used ITF_8, and used separate values in the indirect texture for TEV stages 1 and 2. This would have required a second indirect matrix (which is fine as there are 3 of them available) to choose the green channel instead of the blue channel, and also would have required changing the texture format to RGBA8 (where all 4 channels are separately specified; this is needed even though the red channel isn't used) instead of IA8 (where alpha and a value shared by R/G/B are specified), which would have doubled the size of the texture.
- They also could have stuck with ITF_3 and put both the bump alpha and the indirect offset in the same channel. Then they could use I8 for the texture, and halve the texture size. Of course, there are only 384 pixels in the texture, so this really doesn't matter much. But it seems kinda silly to have a function for splitting bits from a value, and then use separate channels as well.
- They probably shouldn't have used the normalized bump alpha function. This is because the normalization doesn't make sense when lerping textures; a value of 31 becomes 255, which tells it to entirely use the texture for stage 1 and ignore the one for stage 0. But the next value over, 0, tells it to use the texture for stage 0 and ignore the one for stage 1, when that's the same texture that would have been used in stage 0 beforehand. The only thing normalization does ensure is that the brightest texture is used in its entirety, at the cost of slightly worse fades. I did test this, and the difference seems to be basically unnoticeable here.