ArcaneNibble/_jpeg_doc.md

## _jpeg_doc.md

      
    Raw
  

              _jpeg_doc.md
            
          
    OUTDATED PLEASE GO TO https://github.com/rqou/m1n1/tree/jpeg

Apple Silicon JPEG encoder/decoder reverse engineering notes

WARNING: Although this document attempts to only describe functional aspects of the hardware, it currently does contain descriptions of algorithms that have been reverse engineered from driver code.
Please read the Asahi Linux reverse engineering policy before continuing
General

REG_0x0 (+0x0000)

This register is not understood yet.
The driver resets this register to 0.
REG_0x4 (+0x0004)

This register is not understood yet.
The driver resets this register to 0.
MODE (+0x0008)

This register controls the mode of operation of the hardware. The details of this register are not understood yet.
This register is set to multiple different values throughout the reset process.
REG_0xc (+0x000C)

This register is not understood yet.
The driver reads this register and stores the value after an interrupt occurs.
(+0x0010)

No access to this register has been observed.
(+0x0014)

No access to this register has been observed.
(+0x0018)

No access to this register has been observed.
(+0x001C)

No access to this register has been observed.
REG_0x20 (+0x0020)

This register is not understood yet.
The driver resets this register to 0xff, and it is written with a 0 after an interrupt occurs.
STATUS (+0x0024)


bit0: Operation is completed ???
bit1: Timeout occurred
bit2: Read buffer overflow
bit3: Write buffer overflow
bit4: Codec buffer overflow
bit5: Some kind of error, happens if macroblock settings are messed up
bit6: AXI error
bit7: The driver checks for this after an interrupt, but the meaning is not understood

CODEC (+0x0028)

This register controls how the JPEG data is processed wrt subsampling mode. It affects both encode and decode.

0 = 4:4:4
1 = 4:2:2
2 = 4:1:1
3 = 4:2:0
4 = 4:0:0

REG_0x2c (+0x002C)

This register is not fully understood yet.
The driver sets this register to 0 when decoding and 1 when encoding. If it is not set to 1 when encoding, only headers will be output. The interrupt handler makes a decision based on this register.
REG_0x30 (+0x0030)

This register is not understood yet.
The driver resets this register to 0.
REG_0x34 (+0x0034)

This register is not fully understood yet.
The driver sets this register to 1 when decoding and 0 when encoding. If it is not set to 0 when encoding, the output will be corrupted in some way.
REG_0x38 (+0x0038)

This register is not fully understood yet.
The driver sets this register to 0 when decoding and 1 when encoding. If it is not set to 1 when encoding, nothing will be output. If it is set to 1 when decoding, the output will be a weird tiled format.
Chroma control

CHROMA_HALVE_H_TYPE1 (+0x003c)

CHROMA_HALVE_H_TYPE2 (+0x0040)

Setting these register to 1 causes chroma to be subsampled horizontally.
The second register produces a different result from the first register. It is speculated that this is related to chroma siting, but this has not been verified yet. If both the second and the first register are set, the second appears to win.
CHROMA_HALVE_V_TYPE1 (+0x0044)

CHROMA_HALVE_V_TYPE2 (+0x0048)

Setting these register to 1 causes chroma to be subsampled vertically.
The second register produces a different result from the first register. It is speculated that this is related to chroma siting, but this has not been verified yet. If both the second and the first register are set, the second appears to win.
CHROMA_DOUBLE_H (+0x004c)

Setting this register to 1 causes chroma to be doubled/interpolated horizontally.
CHROMA_QUADRUPLE_H (+0x0050)

Setting this register to 1 causes chroma to be quadrupled/interpolated horizontally. If both this and the previous register are set, double appears to win.
CHROMA_DOUBLE_V (+0x0054)

Setting this register to 1 causes chroma to be doubled/interpolated vertically.
Pixel data control

PX_USE_PLANE1 (+0x0058)

Setting this register to 1 enables use of the second pixel plane.
PX_TILES_W (+0x005c)

This register specifies the width of the image in tiles/MCUs/macroblocks, where the macroblock size depends on the chroma subsampling mode, i.e. divroundup by 8 for 4:4:4, by 16 for 4:2:2 and 4:2:0, by 32 for 4:1:1 (FIXME verify this again).
PX_TILES_H (+0x0060)

This register specifies the height of the image in tiles/MCUs/macroblocks, where the macroblock size depends on the chroma subsampling mode, i.e. divroundup by 16 for 4:2:0 or else by 8 (FIXME verify this again).
PX_PLANE0_WIDTH (+0x0064)

This register specifies the width of the image data in plane 0, in bytes, minus 1. When decoding, it is important to set this correctly for the edge to be processed properly.
PX_PLANE0_HEIGHT (+0x0068)

This register specifies the height of the image data in plane 0, in rows, minus 1. When decoding, it might be important to set this correctly for the edge to be processed properly.
PX_PLANE0_TILING_H (+0x006c)

This register somehow controls how pixel data matches up with subsampled chroma data, but the details are not understood yet. Valid range 0-31.
PX_PLANE0_TILING_V (+0x0070)

This register somehow controls how pixel data matches up with subsampled chroma data, but the details are not understood yet. Valid range 0-31.
PX_PLANE0_STRIDE (+0x0074)

This is the row stride of plane 0 in bytes.
PX_PLANE1_WIDTH (+0x0078)

PX_PLANE1_HEIGHT (+0x007c)

PX_PLANE1_TILING_H (+0x0080)

PX_PLANE1_TILING_V (+0x0084)

PX_PLANE1_STRIDE (+0x0088)

These registers function similarly to the plane 0 registers.
Input/output pointers

INPUT_START1 (+0x008c)

Input pointer 1 IOVA.
INPUT_START2 (+0x0090)

Input pointer 2 IOVA.
REG_0x94 (+0x0094)

This register is not understood yet.
The driver sets this register to a fixed value of 0x1f when decoding and to a value that depends on the chroma subsampling mode when encoding (0xc for 4:4:4, 0x8 for 4:2:2, 0x3 for 4:2:0, 0xb for 4:0:0), but changing it does not seem to do anything.
REG_0x98 (+0x0098)

This register is not understood yet.
The driver sets this register to a fixed value of 1 when decoding and to a value that depends on the chroma subsampling mode when encoding (2 for 4:4:4/4:2:2/4:0:0, 1 for 4:2:0), but changing it does not seem to do anything.
INPUT_END (+0x009c)

End of input data IOVA.
For reasons that are not understood, this is ORed with 7 when encoding.
OUTPUT_START1 (+0x00a0)

Output pointer 1 IOVA.
OUTPUT_START2 (+0x00a4)

Output pointer 2 IOVA.
OUTPUT_END (+0x00a8)

End of output data IOVA.
MATRIX_MULT (+0x00ac-0x00d7) (11 entries)

Color space conversion matrix.
The full details of the shifting/offset/final two values is not understood yet.
DITHER (+0x00d8-0x00ff) (10 entries)

Dithering when decoding to RGB565.
The full details of this is not understood yet.
Encoding pixel format

ENCODE_PIXEL_FORMAT (+0x0100)


0 = RGB101010
1 = YUV10 linear (partially tested, details not fully understood)
2 = RGB888
3 = RGB565
4 = YUV planar (partially tested, details not fully understood)
5 = YUV linear (partially tested, details not fully understood)

ENCODE_COMPONENT0_POS (+0x0104)

ENCODE_COMPONENT1_POS (+0x0108)

ENCODE_COMPONENT2_POS (+0x010c)

ENCODE_COMPONENT3_POS (+0x0110)

These registers control the positions of each component in the parsed pixel data. It is used to allow e.g. flipping between RGBA and BGRA.
CONVERT_COLOR_SPACE (+0x0114)

Setting this register to 1 enables color space conversion when encoding
Unknown

REG_0x118 (+0x0118)

This register is not understood yet.
This register is set to 0 when decoding and 1 when encoding.
REG_0x11c (+0x011c)

This register is not understood yet.
This register is set to 1 when decoding and 0 when encoding.
REG_0x120 (+0x0120)

This register is not understood yet.
The driver resets this register to 0.
UNTESTED_SURFACE_TILING (+0x0124)

This register is not understood yet.
The driver sets this register to 1 when decoding if the surface "is tiled," but setting this register doesn't seem to actually do anything.
REG_0x128 (+0x0128)

This register is not understood yet.
The driver resets this register to 0.
REG_0x12c (+0x012c)

This register is not understood yet.
The driver resets this register to 0.
Decoding image size

DECODE_MACROBLOCKS_W (+0x0130)

Sets the width of the decoded image in macroblocks, where the macroblock size depends on the chroma subsampling mode, i.e. divroundup by 8 for 4:4:4, by 16 for 4:2:2 and 4:2:0, by 32 for 4:1:1.
DECODE_MACROBLOCKS_H (+0x0134)

Sets the height of the decoded image in macroblocks, where the macroblock size depends on the chroma subsampling mode, i.e. divroundup by 16 for 4:2:0 or else by 8.
RIGHT_EDGE_PIXELS (+0x0138)

The driver sets this to the number of pixels that are valid in the rightmost macroblocks, but changing it does not seem to do anything.
BOTTOM_EDGE_PIXELS (+0x013c)

The driver sets this to the number of pixels that are valid in the bottommost macroblocks, but changing it does not seem to do anything.
RIGHT_EDGE_SAMPLES (+0x0140)

The driver sets this to the number of chroma samples that are valid in the rightmost macroblocks, but changing it does not seem to do anything.
BOTTOM_EDGE_SAMPLES (+0x0144)

The driver sets this to the number of chroma samples that are valid in the bottommost macroblocks, but changing it does not seem to do anything.
SCALE_FACTOR (+0x0148)


0 = /1
1 = /2
2 = /4
3 = /8

The driver checks that scaling is /1 when encoding, but it is not yet tested what happens if this is not the case.
Decoding pixel format

DECODE_PIXEL_FORMAT (+0x014c)


0 = YUV 444 (2P)
1 = YUV 422 (2P)
2 = YUV 420 (2P)
3 = YUV 422 (1P)
4 = driver mentions YUV10 444 (1P) but it does not appear to work (driver also says it doesn't work)
5 = RGB888
6 = RGB565
7 = driver mentions RGB101010 but it does not appear to work (driver also says it doesn't work)

YUV422_ORDER (+0x0150)


0 = Cb Y'0 Cr Y'1
1 = Y'0 Cb Y'1 Cr

RGBA_ORDER (+0x0154)


0 = BGRA
1 = RGBA

RGBA_ALPHA (+0x0158)

This value is filled in to alpha bytes when decoding into RGB888
Unknown/status

REG_0x15c (+0x015c)

This register is not understood yet.
The driver resets this register to 0.
REG_0x160 (+0x0160)

This register is not understood yet.
The driver resets this register to a configurable value that happens to be 0.
REG_0x164 (+0x0164)

This register is not understood yet.
The driver reads this register and stores the value after an interrupt occurs.
(+0x0168)

No access to this register has been observed.
REG_0x16c (+0x016c)

This register is not understood yet.
The driver reads this register and stores the value after an interrupt occurs.
REG_0x170 (+0x0170)

This register is not understood yet.
The driver reads this register and stores the value after an interrupt occurs.
(+0x0174)

No access to this register has been observed.
PERFCOUNTER (+0x0178)

This register appears to be a performance counter. It is not yet understood what is being measured.
The driver reads this register and accumulates it after an interrupt occurs.
(+0x017c)

No access to this register has been observed.
(+0x0180)

No access to this register has been observed.
TIMEOUT (+0x0184)

This register configures the timeout. It is not yet understood what units this is in.
HWREV (+0x0188)

This register contains the hardware revision. On the M1 Max, it is 0xd1013.
(+0x018c)

No access to this register has been observed.
(+0x0190)

No access to this register has been observed.
(+0x0194)

No access to this register has been observed.
(+0x0198)

No access to this register has been observed.
REG_0x19c (+0x019c)

This register is not understood yet.
The driver under some conditions writes a 1 here.
RST logging / unknown

ENABLE_RST_LOGGING (+0x01a0)

If this register is set to 1, some data about RST blocks will be logged when encoding.
RST_LOG_ENTRIES (+0x01a4)

This register will contain the number of RST log entries.
REG_0x1a8 (+0x01a8)

REG_0x1ac (+0x01ac)

REG_0x1b0 (+0x01b0)

REG_0x1b4 (+0x01b4)

REG_0x1b8 (+0x01b8)

REG_0x1bc (+0x01bc)

REG_0x1c0 (+0x01c0)

REG_0x1c4 (+0x01c4)

REG_0x1c8 (+0x01c8)

REG_0x1cc (+0x01cc)

REG_0x1d0 (+0x01d0)

REG_0x1d4 (+0x01d4)

REG_0x1d8 (+0x01d8)

REG_0x1dc (+0x01dc)

REG_0x1e0 (+0x01e0)

REG_0x1e4 (+0x01e4)

REG_0x1e8 (+0x01e8)

REG_0x1ec (+0x01ec)

REG_0x1f0 (+0x01f0)

REG_0x1f4 (+0x01f4)

REG_0x1f8 (+0x01f8)

This register is not understood yet.
Compressed pixel format / Compressed DMA / unknown

REG_0x1fc (+0x01fc)

REG_0x200 (+0x0200)

REG_0x204 (+0x0204)

REG_0x208 (+0x0208)

REG_0x20c (+0x020c)

REG_0x210 (+0x0210)

REG_0x214 (+0x0214)

REG_0x218 (+0x0218)

REG_0x21c (+0x021c)

REG_0x220 (+0x0220)

REG_0x224 (+0x0224)

REG_0x228 (+0x0228)

REG_0x22c (+0x022c)

REG_0x230 (+0x0230)

REG_0x234 (+0x0234)

REG_0x238 (+0x0238)

REG_0x23c (+0x023c)

REG_0x240 (+0x0240)

REG_0x244 (+0x0244)

REG_0x248 (+0x0248)

REG_0x24c (+0x024c)

REG_0x250 (+0x0250)

REG_0x254 (+0x0254)

REG_0x258 (+0x0258)

REG_0x25c (+0x025c)

This register is not understood yet.
(+0x0260-0x0fff)

No access to this register has been observed.
JPEG I/O related

JPEG_IO_FLAGS (+0x1000)


bit0-2 control subsampling mode output into the JPEG when encoding

0 = 4:4:4
1 = 4:2:2
2 = 4:2:0
3 = monochrome
4 = 4 components ??? seems to work with 422 with 444 tiling params ?????
6 = indicate 4:1:1 in file, but setting CODEC = 2 doesn't actually work (broken)


bit3 needs to be set when decoding. It must be unset when encoding. This is not fully understood yet
bit4 causes macroblocks to not be flipped horizontally. It affects both encoding and decoding.
bit5 causes chunks of 8 bytes to not be reversed. It affects both encoding and decoding.

REG_0x1004 (+0x1004)

This register is not understood yet.
(+0x1008)

No access to this register has been observed.
QTBL_SEL (+0x100c)

This register selects the quantization table in use for each component.

bit0-1 = component 0
bit2-3 = component 1
bit4-5 = component 2
bit6-7 = component 3?

HUFFMAN_TABLE (+0x1010)

This register controls Huffman tables used. The details of this register are not fully understood yet.
RST_INTERVAL (+0x1014)

This register controls the interval at which RST markers will be generated when encoding.
JPEG_HEIGHT (+0x1018)

This register specifies the height of the JPEG when encoding. It appears to only affect the header.
JPEG_WIDTH (+0x101c)

This register specifies the width of the JPEG when encoding.
COMPRESSED_BYTES (+0x1020)

This register will contain the final size of the JPEG when encoding
JPEG_OUTPUT_FLAGS (+0x1024)


bit0 doesn't seem to do anything
bit1 = output only SOS/EOI, no SOI/DQT/SOF0/DHT
bit2 = output SOF0 after DHT instead of before it
bit3 doesn't seem to do anything
bit4 not sure exactly what this does, but it makes compression worse

REG_0x1028 (+0x1028)

This register is not understood yet.
The driver sets this register to 0x400 when decoding.
REG_0x102c (+0x102c)

This register is not understood yet.
The driver reads this register and does something with it after an interrupt occurs.
BITSTREAM_CORRUPTION (+0x1030)

This register is not understood yet. It supposedly contains information about bitstream corruption.
(+0x1034-0x10ff)

No access to this register has been observed.
QTBL (+0x1100-0x11ff)

Quantization tables. The exact layout is not understood yet (zigzag or not?)
(+0x1200-0x1fff)

No access to this register has been observed.
RSTLOG (+0x2000-0x2fff)

RST log. The details of this are not understood yet.
(+0x3000-0x3fff)

No access to this register has been observed.

  
## jpeg_test.py
# WARNING WARNING WARNING WARNING WARNING
# This code is "tainted" -- it is derived from disassembled/decompiled macOS code
# Please read the Asahi Linux reverse engineering policy before continuing
# https://asahilinux.org/copyright/#reverse-engineering-policy
# WARNING WARNING WARNING WARNING WARNING


from proxyclient.m1n1.setup import *
from proxyclient.m1n1.hw.dart import DART, DARTRegs
from proxyclient.m1n1.utils import *
import struct
import sys
import time
from PIL import Image, ImageDraw

def divroundup(val, div):
    return (val + div - 1) // div

##### THIS FOR ENCODE

if len(sys.argv) < 2:
    print(f"Usage: {sys.argv[0]} input_img")
    sys.exit(1)

input_fn = sys.argv[1]

image_data = b''
with Image.open(input_fn) as im:
    W, H = im.size

    for y in range(H):
        for x in range(W):
            px = im.getpixel((x ,y))
            (r, g, b) = px
            image_data += struct.pack("BBBB", r, g, b, 255)
            # image_data += struct.pack("<H", (r >> 3) | ((b >> 2) << 5) | ((g >> 3) << 11))
            # image_data += struct.pack("<I", (r << 2) | (b << 12) | (g << 22))

input_sz = W*H*4
input_sz_aligned = align_up(input_sz)

output_sz = W*H*4
output_sz_aligned = align_up(output_sz)

print(f"Using size {input_sz_aligned:08X} for input image")
print(f"Using size {output_sz_aligned:08X} for output data")

# ##### THIS FOR DECODE

# if len(sys.argv) < 2:
#     print(f"Usage: {sys.argv[0]} input.jpg")
#     sys.exit(1)

# input_fn = sys.argv[1]

# with open(input_fn, 'rb') as f:
#     jpeg_data = f.read()

# found_sof0 = False

# jpeg_work = jpeg_data
# while jpeg_work:
#     seg_marker = struct.unpack(">H", jpeg_work[:2])[0]
#     print(f"Seg {seg_marker:04X}")
#     if seg_marker == 0xFFD8:
#         # SOI
#         jpeg_work = jpeg_work[2:]
#     elif seg_marker == 0xFFDA:
#         # SOS
#         break
#     else:
#         seg_len = struct.unpack(">H", jpeg_work[2:4])[0]
#         assert seg_len >= 2
#         # print(seg_len)
#         seg_data = jpeg_work[4:4 + seg_len - 2]
#         # print(seg_data)
#         jpeg_work = jpeg_work[4 + seg_len - 2:]

#         if seg_marker == 0xFFC0:
#             # SOF0
#             assert not found_sof0
#             found_sof0 = True
#             sof0 = struct.unpack(">BHHB", seg_data[:6])
#             # print(sof0)
#             (jpeg_bpp, jpeg_H, jpeg_W, jpeg_components_cnt) = sof0
#             assert jpeg_bpp == 8
#             assert jpeg_components_cnt == 3
#             jpeg_components = []
#             for i in range(jpeg_components_cnt):
#                 comp_id, comp_sampling, comp_quant = seg_data[6+3*i:6+3*i+3]
#                 # print(comp_id, comp_sampling, comp_quant)
#                 jpeg_components.append((comp_id, comp_sampling >> 4, comp_sampling & 0xF, comp_quant))
#             # print(jpeg_components)

#             # assert jpeg_components == [(1, 2, 2, 0), (2, 1, 1, 1), (3, 1, 1, 1)]

# assert found_sof0
# print(f"JPEG is {jpeg_W}x{jpeg_H}")

# jpeg_sz_aligned = align_up(len(jpeg_data))
# print(f"Using size {jpeg_sz_aligned:08X} for JPEG data")

# # FIXME how much larger do we need? needed +16 at least once, set large for debugging
# output_W = 2 * jpeg_W
# output_H = 2 * jpeg_H
# output_img_sz = 4 * output_W * output_H
# output_img_sz_aligned = align_up(output_img_sz)
# print(f"Using size {output_img_sz_aligned:08X} for output image")

class R_STATUS(Register32):
    DONE = 0
    TIMEOUT = 1
    RD_BUF_OVERFLOW = 2
    WR_BUF_OVERFLOW = 3
    CODEC_BUF_OVERFLOW = 4
    SOME_KIND_OF_MACROBLOCK_SIZE_ERROR = 5
    AXI_ERROR = 6


class R_JPEG_IO_FLAGS(Register32):
    # 0x0 = 4:4:4
    # 0x1 = 4:2:2
    # 0x2 = 4:2:0
    # 0x3 = monochrome
    # 0x4 = 4 components ??? seems to work with 422 with 444 tiling params ?????
    # 0x6 = indicate 4:1:1 in file, but setting CODEC = 2 doesn't actually work (broken)
    SUBSAMPLING_MODE = 2, 0
    # not sure what this is supposed to do
    MAKE_DECODE_WORK_BREAK_ENCODE = 3
    OUTPUT_MACROBLOCKS_UNFLIPPED_H = 4
    OUTPUT_8BYTE_CHUNKS_CORRECTLY = 5


class R_JPEG_OUTPUT_FLAGS(Register32):
    # bit0 doesn't seem to do anything
    SKIP_HEADERS = 1            # output only SOS/EOI, no SOI/DQT/SOF0/DHT
    OUTPUT_SOF0_AFTER_DHT = 2   # output SOF0 after DHT instead of before it
    # bit3 doesn't seem to do anything
    COMPRESS_WORSE = 4          # not sure exactly what this does


class R_QTBL_SEL(Register32):
    COMPONENT0 = 1, 0
    COMPONENT1 = 3, 2
    COMPONENT2 = 5, 4
    COMPONENT3 = 7, 6     # guessed

class JPEGRegs(RegMap):
    REG_0x0 = 0x0, Register32
    REG_0x4 = 0x4, Register32
    MODE = 0x8, Register32
    REG_0xc = 0xc, Register32

    REG_0x20 = 0x20, Register32
    STATUS = 0x24, R_STATUS

    # 0 = YUV 444
    # 1 = YUV 422
    # 2 = YUV 411
    # 3 = YUV 420
    # 4 = YUV 400
    CODEC = 0x28, Register32

    REG_0x2c = 0x2c, Register32
    REG_0x30 = 0x30, Register32
    REG_0x34 = 0x34, Register32
    REG_0x38 = 0x38, Register32   # this changes the output drastically if set to 1 for decode; breaks encode if not set to 1

    # not sure what the difference is. siting? type2 seems to win over type1
    CHROMA_HALVE_H_TYPE1 = 0x3c, Register32
    CHROMA_HALVE_H_TYPE2 = 0x40, Register32
    CHROMA_HALVE_V_TYPE1 = 0x44, Register32
    CHROMA_HALVE_V_TYPE2 = 0x48, Register32

    # if double and quadruple both set --> double
    CHROMA_DOUBLE_H = 0x4c, Register32
    CHROMA_QUADRUPLE_H = 0x50, Register32
    CHROMA_DOUBLE_V = 0x54, Register32

    # details not fully understood yet
    PX_USE_PLANE1 = 0x58, Register32
    PX_TILES_W = 0x5c, Register32
    PX_TILES_H = 0x60, Register32
    PX_PLANE0_WIDTH = 0x64, Register32
    PX_PLANE0_HEIGHT = 0x68, Register32
    PX_PLANE0_TILING_H = 0x6c, Register32
    PX_PLANE0_TILING_V = 0x70, Register32
    PX_PLANE0_STRIDE = 0x74, Register32
    PX_PLANE1_WIDTH = 0x78, Register32
    PX_PLANE1_HEIGHT = 0x7c, Register32
    PX_PLANE1_TILING_H = 0x80, Register32
    PX_PLANE1_TILING_V = 0x84, Register32
    PX_PLANE1_STRIDE = 0x88, Register32

    INPUT_START1 = 0x8c, Register32
    INPUT_START2 = 0x90, Register32
    REG_0x94 = 0x94, Register32
    REG_0x98 = 0x98, Register32
    INPUT_END = 0x9c, Register32

    OUTPUT_START1 = 0xa0, Register32
    OUTPUT_START2 = 0xa4, Register32
    OUTPUT_END = 0xa8, Register32

    MATRIX_MULT = irange(0xAC, 11, 4), Register32
    DITHER = irange(0xD8, 10, 4), Register32

    # 0 = RGB101010
    # 1 = YUV10 linear (partially tested, details not understood)
    # 2 = RGB888
    # 3 = RGB565
    # 4 = YUV planar (partially tested, details not understood)
    # 5 = YUV linear (partially tested, details not understood)
    ENCODE_PIXEL_FORMAT = 0x100, Register32
    # RGB888: R, G, B = byte pos
    # RGB101010: R, G, B = 0/1/2 = low/mid/high bits
    # RGB565: R, G, B = 0/1/2 = low/mid/high bits
    ENCODE_COMPONENT0_POS = 0x104, Register32
    ENCODE_COMPONENT1_POS = 0x108, Register32
    ENCODE_COMPONENT2_POS = 0x10c, Register32
    ENCODE_COMPONENT3_POS = 0x110, Register32

    CONVERT_COLOR_SPACE = 0x114, Register32

    REG_0x118 = 0x118, Register32
    REG_0x11c = 0x11c, Register32

    REG_0x120 = 0x120, Register32
    UNTESTED_SURFACE_TILING = 0x124, Register32     # this doesn't seem to work???
    REG_0x128 = 0x128, Register32
    REG_0x12c = 0x12c, Register32

    DECODE_MACROBLOCKS_W = 0x130, Register32
    DECODE_MACROBLOCKS_H = 0x134, Register32
    RIGHT_EDGE_PIXELS = 0x138, Register32
    BOTTOM_EDGE_PIXELS = 0x13c, Register32
    RIGHT_EDGE_SAMPLES = 0x140, Register32
    BOTTOM_EDGE_SAMPLES = 0x144, Register32

    SCALE_FACTOR = 0x148, Register32    # 0-3 --> /1 /2 /4 /8

    # 0 = YUV 444 (2P)
    # 1 = YUV 422 (2P)
    # 2 = YUV 420 (2P)
    # 3 = YUV 422 (1P)
    # 4 = driver mentions YUV10 444 (1P) but it does not appear to work (driver also says it doesn't)
    # 5 = RGB888
    # 6 = RGB565
    # 7 = driver mentions RGB101010 but it does not appear to work (driver also says it doesn't)
    DECODE_PIXEL_FORMAT = 0x14c, Register32
    YUV422_ORDER = 0x150, Register32    # 0 = Cb Y'0 Cr Y'1     1 = Y'0 Cb Y'1 Cr
    RGBA_ORDER = 0x154, Register32      # 0 = BGRA              1 = RGBA
    RGBA_ALPHA = 0x158, Register32

    REG_0x15c = 0x15c, Register32

    REG_0x160 = 0x160, Register32
    REG_0x164 = 0x164, Register32
    # REG_0x168 = 0x168, Register32
    REG_0x16c = 0x16c, Register32

    REG_0x170 = 0x170, Register32
    # REG_0x174 = 0x174, Register32
    PERFCOUNTER = 0x178, Register32     # guessed
    # REG_0x17c = 0x17c, Register32

    # REG_0x180 = 0x180, Register32
    TIMEOUT = 0x184, Register32
    HWREV = 0x188, Register32
    # REG_0x18c = 0x18c, Register32

    # REG_0x190 = 0x190, Register32
    # REG_0x194 = 0x194, Register32
    # REG_0x198 = 0x198, Register32
    REG_0x19c = 0x19c, Register32

    ENABLE_RST_LOGGING = 0x1a0, Register32
    RST_LOG_ENTRIES = 0x1a4, Register32

    REG_0x1a8 = 0x1a8, Register32
    REG_0x1ac = 0x1ac, Register32

    REG_0x1b0 = 0x1b0, Register32
    REG_0x1b4 = 0x1b4, Register32
    # REG_0x1b8 = 0x1b8, Register32
    REG_0x1bc = 0x1bc, Register32

    REG_0x1c0 = 0x1c0, Register32
    REG_0x1c4 = 0x1c4, Register32
    REG_0x1c8 = 0x1c8, Register32
    REG_0x1cc = 0x1cc, Register32

    REG_0x1d0 = 0x1d0, Register32
    REG_0x1d4 = 0x1d4, Register32
    # REG_0x1d8 = 0x1d8, Register32
    # REG_0x1dc = 0x1dc, Register32

    # REG_0x1e0 = 0x1e0, Register32
    # REG_0x1e4 = 0x1e4, Register32
    # REG_0x1e8 = 0x1e8, Register32
    # REG_0x1ec = 0x1ec, Register32

    # REG_0x1f0 = 0x1f0, Register32
    # REG_0x1f4 = 0x1f4, Register32
    # REG_0x1f8 = 0x1f8, Register32
    REG_0x1fc = 0x1fc, Register32

    REG_0x200 = 0x200, Register32
    REG_0x204 = 0x204, Register32
    REG_0x208 = 0x208, Register32
    REG_0x20c = 0x20c, Register32

    REG_0x210 = 0x210, Register32
    REG_0x214 = 0x214, Register32
    REG_0x218 = 0x218, Register32
    REG_0x21c = 0x21c, Register32

    REG_0x220 = 0x220, Register32
    REG_0x224 = 0x224, Register32
    REG_0x228 = 0x228, Register32
    REG_0x22c = 0x22c, Register32

    REG_0x230 = 0x230, Register32
    REG_0x234 = 0x234, Register32
    # REG_0x238 = 0x238, Register32
    REG_0x23c = 0x23c, Register32

    REG_0x240 = 0x240, Register32
    REG_0x244 = 0x244, Register32
    REG_0x248 = 0x248, Register32
    REG_0x24c = 0x24c, Register32

    REG_0x250 = 0x250, Register32
    REG_0x254 = 0x254, Register32
    REG_0x258 = 0x258, Register32
    REG_0x25c = 0x25c, Register32

    JPEG_IO_FLAGS = 0x1000, R_JPEG_IO_FLAGS
    REG_0x1004 = 0x1004, Register32
    # REG_0x1008 = 0x1008, Register32
    QTBL_SEL = 0x100c, R_QTBL_SEL

    HUFFMAN_TABLE = 0x1010, Register32  # fixme what _exactly_ does this control
    RST_INTERVAL = 0x1014, Register32     # 16 bits effective
    JPEG_HEIGHT = 0x1018, Register32
    JPEG_WIDTH = 0x101c, Register32

    COMPRESSED_BYTES = 0x1020, Register32
    JPEG_OUTPUT_FLAGS = 0x1024, R_JPEG_OUTPUT_FLAGS
    REG_0x1028 = 0x1028, Register32
    REG_0x102c = 0x102c, Register32

    BITSTREAM_CORRUPTION = 0x1030, Register32
    # REG_0x1034 = 0x1034, Register32
    # REG_0x1038 = 0x1038, Register32
    # REG_0x103c = 0x103c, Register32

    QTBL = irange(0x1100, 64, 4), Register32

    # todo what's the format?
    RSTLOG = irange(0x2000, 1024, 4), Register32


p.pmgr_adt_clocks_enable('/arm-io/dart-jpeg0')
p.pmgr_adt_clocks_enable('/arm-io/jpeg0')

dart = DART.from_adt(u, '/arm-io/dart-jpeg0')
dart.initialize()

jpeg_base, _ = u.adt['/arm-io/jpeg0'].get_reg(0)
jpeg = JPEGRegs(u, jpeg_base)


def reset_block():
    jpeg.MODE.val = 0x100
    jpeg.MODE.val = 0x13e

    set_default_regs()

    jpeg.MODE.val = 0x17f
    for _ in range(10000):
        v = jpeg.REG_0x1004.val
        if v == 0:
            break
        print(f"reset 1 -- {v}")
    if (v := jpeg.REG_0x1004.val) != 0:
        print(f"reset 1 failed! -- {v}")
        assert False

    jpeg.RST_INTERVAL.val = 1
    for _ in range(2500):
        v = jpeg.RST_INTERVAL.val
        if v == 1:
            break
        print(f"reset 2 -- {v}")
    if (v := jpeg.RST_INTERVAL.val) != 1:
        print(f"reset 2 failed! -- {v}")
        assert False
    jpeg.RST_INTERVAL.val = 0

    jpeg.ENABLE_RST_LOGGING.val = 0
    jpeg.REG_0x1a8.val = 0
    jpeg.REG_0x1ac.val = 0
    jpeg.REG_0x1b0.val = 0
    jpeg.REG_0x1b4.val = 0
    jpeg.REG_0x1bc.val = 0
    jpeg.REG_0x1c0.val = 0
    jpeg.REG_0x1c4.val = 0
    jpeg.REG_0x1c8.val = 0
    jpeg.REG_0x1cc.val = 0
    jpeg.REG_0x1d0.val = 0
    jpeg.REG_0x1d4.val = 0

    jpeg.MODE.val = 0x143

def set_default_regs(param1=0):
    jpeg.REG_0x0.val = 0
    jpeg.REG_0x0.val = 0
    jpeg.REG_0x4.val = 0
    jpeg.CODEC.val = 0
    jpeg.REG_0x2c.val = 0
    jpeg.REG_0x30.val = 0
    jpeg.REG_0x34.val = 1
    jpeg.REG_0x38.val = 1
    jpeg.CHROMA_HALVE_H_TYPE1.val = 0
    jpeg.CHROMA_HALVE_H_TYPE2.val = 0
    jpeg.CHROMA_HALVE_V_TYPE1.val = 0
    jpeg.CHROMA_HALVE_V_TYPE2.val = 0
    jpeg.CHROMA_DOUBLE_H.val = 0
    jpeg.CHROMA_QUADRUPLE_H.val = 0
    jpeg.CHROMA_DOUBLE_V.val = 0
    jpeg.REG_0x15c.val = 0
    jpeg.PX_USE_PLANE1.val = 0
    jpeg.PX_TILES_W.val = 1
    jpeg.PX_TILES_H.val = 1
    jpeg.PX_PLANE0_WIDTH.val = 1
    jpeg.PX_PLANE0_HEIGHT.val = 1
    jpeg.PX_PLANE0_TILING_H.val = 1
    jpeg.PX_PLANE0_TILING_V.val = 1
    jpeg.PX_PLANE0_STRIDE.val = 1
    jpeg.PX_PLANE1_WIDTH.val = 1
    jpeg.PX_PLANE1_HEIGHT.val = 1
    jpeg.PX_PLANE1_TILING_H.val = 1
    jpeg.PX_PLANE1_TILING_V.val = 1
    jpeg.PX_PLANE1_STRIDE.val = 1
    jpeg.INPUT_START1.val = 0
    jpeg.INPUT_START2.val = 0
    jpeg.REG_0x94.val = 1
    jpeg.REG_0x98.val = 1
    jpeg.INPUT_END.val = 0xffffffff
    jpeg.OUTPUT_START1.val = 0
    jpeg.OUTPUT_START2.val = 0
    jpeg.OUTPUT_END.val = 0xffffffff
    for i in range(11):
        jpeg.MATRIX_MULT[i].val = 0
    for i in range(10):
        jpeg.DITHER[i].val = 0xff
    jpeg.ENCODE_PIXEL_FORMAT.val = 0
    jpeg.ENCODE_COMPONENT0_POS.val = 0
    jpeg.ENCODE_COMPONENT1_POS.val = 0
    jpeg.ENCODE_COMPONENT2_POS.val = 0
    jpeg.ENCODE_COMPONENT3_POS.val = 0
    jpeg.CONVERT_COLOR_SPACE.val = 0
    jpeg.REG_0x118.val = 0
    jpeg.REG_0x11c.val = 0
    jpeg.REG_0x120.val = 0
    jpeg.UNTESTED_SURFACE_TILING.val = 0
    jpeg.REG_0x128.val = 0
    jpeg.REG_0x12c.val = 0
    jpeg.DECODE_MACROBLOCKS_W.val = 0
    jpeg.DECODE_MACROBLOCKS_H.val = 0
    jpeg.SCALE_FACTOR.val = 0
    jpeg.DECODE_PIXEL_FORMAT.val = 0
    jpeg.YUV422_ORDER.val = 0
    jpeg.RGBA_ORDER.val = 0
    jpeg.RGBA_ALPHA.val = 0
    jpeg.RIGHT_EDGE_PIXELS.val = 0
    jpeg.BOTTOM_EDGE_PIXELS.val = 0
    jpeg.RIGHT_EDGE_SAMPLES.val = 0
    jpeg.BOTTOM_EDGE_SAMPLES.val = 0

    # this is always done on the m1 max hwrev
    jpeg.REG_0x1fc.val = 0
    jpeg.REG_0x200.val = 0
    jpeg.REG_0x204.val = 0
    jpeg.REG_0x208.val = 0
    jpeg.REG_0x214.val = 0
    jpeg.REG_0x218.val = 0
    jpeg.REG_0x21c.val = 0
    jpeg.REG_0x220.val = 0
    jpeg.REG_0x224.val = 0
    jpeg.REG_0x228.val = 0
    jpeg.REG_0x22c.val = 0
    jpeg.REG_0x230.val = 0
    jpeg.REG_0x234.val = 0x1f40
    jpeg.REG_0x244.val = 0
    jpeg.REG_0x248.val = 0
    jpeg.REG_0x258.val = 0
    jpeg.REG_0x25c.val = 0
    jpeg.REG_0x23c.val = 0
    jpeg.REG_0x240.val = 0
    jpeg.REG_0x250.val = 0
    jpeg.REG_0x254.val = 0

    jpeg.REG_0x160.val = param1
    jpeg.TIMEOUT.val = 0
    jpeg.REG_0x20.val = 0xff

def decode(input_iova, input_sz, output_iova, output_sz):
    jpeg.REG_0x34.val = 1
    jpeg.REG_0x2c.val = 0
    jpeg.REG_0x38.val = 0
    jpeg.CODEC.val = 0
    jpeg.DECODE_PIXEL_FORMAT = 5

    # image boundary
    jpeg.PX_USE_PLANE1.val = 1
    jpeg.PX_PLANE0_WIDTH.val = jpeg_W*4 - 1
    jpeg.PX_PLANE0_HEIGHT.val = jpeg_H - 1
    # jpeg.PX_PLANE1_WIDTH.val = jpeg_W * 2 - 1     # HACK HACK
    # jpeg.PX_PLANE1_HEIGHT.val = jpeg_H - 1
    # jpeg.TIMEOUT.val = 100200
    jpeg.TIMEOUT.val = 266000000

    jpeg.REG_0x94 = 0x1f
    jpeg.REG_0x98 = 1

    # jpeg.CHROMA_HALVE_H_TYPE1.val = 1
    # jpeg.YUV422_ORDER.val = 1
    # jpeg.UNTESTED_SURFACE_TILING = 1

    # wrcnvset_w_h
    jpeg.DECODE_MACROBLOCKS_W.val = divroundup(jpeg_W, 8)
    jpeg.DECODE_MACROBLOCKS_H.val = divroundup(jpeg_H, 8)
    right_edge_px = jpeg_W - divroundup(jpeg_W, 8)*8 + 8
    bot_edge_px = jpeg_H - divroundup(jpeg_H, 8)*8 + 8
    # XXX changing this does not seem to do anything
    jpeg.RIGHT_EDGE_PIXELS.val = right_edge_px
    jpeg.BOTTOM_EDGE_PIXELS.val = bot_edge_px
    jpeg.RIGHT_EDGE_SAMPLES.val = right_edge_px // 2
    jpeg.BOTTOM_EDGE_SAMPLES.val = bot_edge_px // 2

    jpeg.PX_TILES_H.val = divroundup(jpeg_H, 8)
    jpeg.PX_TILES_W.val = divroundup(jpeg_W, 8)
    jpeg.PX_PLANE0_TILING_H.val = 4
    jpeg.PX_PLANE0_TILING_V.val = 8
    jpeg.PX_PLANE1_TILING_H.val = 0
    jpeg.PX_PLANE1_TILING_V.val = 0

    jpeg.MATRIX_MULT[0].val = 0x100
    jpeg.MATRIX_MULT[1].val = 0x0
    jpeg.MATRIX_MULT[2].val = 0x167
    jpeg.MATRIX_MULT[3].val = 0x100
    jpeg.MATRIX_MULT[4].val = 0xffffffa8
    jpeg.MATRIX_MULT[5].val = 0xffffff49
    jpeg.MATRIX_MULT[6].val = 0x100
    jpeg.MATRIX_MULT[7].val = 0x1c6
    jpeg.MATRIX_MULT[8].val = 0x0
    jpeg.MATRIX_MULT[9].val = 0x0
    jpeg.MATRIX_MULT[10].val = 0xffffff80

    # submode
    jpeg.RGBA_ALPHA.val = 0xff
    jpeg.RGBA_ORDER.val = 1

    jpeg.SCALE_FACTOR.val = 0

    # pointers
    jpeg.INPUT_START1.val = input_iova
    jpeg.INPUT_START2.val = 0xdeadbeef
    jpeg.INPUT_END.val = input_iova + input_sz
    jpeg.OUTPUT_START1.val = output_iova
    jpeg.OUTPUT_START2.val = output_iova + jpeg_W * 4   # HACK
    jpeg.OUTPUT_END.val = output_iova + output_sz
    jpeg.PX_PLANE0_STRIDE.val = output_W * 4
    jpeg.PX_PLANE1_STRIDE.val = output_W * 4    # HACK

    jpeg.REG_0x1ac.val = 0x0
    jpeg.REG_0x1b0.val = 0x0
    jpeg.REG_0x1b4.val = 0x0
    jpeg.REG_0x1bc.val = 0x0
    jpeg.REG_0x1c0.val = 0x0
    jpeg.REG_0x1c4.val = 0x0

    jpeg.REG_0x118.val = 0x0
    jpeg.REG_0x11c.val = 0x1

    jpeg.MODE.val = 0x177
    jpeg.REG_0x1028.val = 0x400

    jpeg.JPEG_IO_FLAGS.val = 0x1f
    jpeg.REG_0x0.val = 0x1
    jpeg.REG_0x1004 = 0x1

def encode(input_iova, input_sz, output_iova, output_sz):
    jpeg.MODE.val = 0x17f
    jpeg.REG_0x38.val = 0x1 # if not set nothing happens
    jpeg.REG_0x2c.val = 0x1 # if not set only header is output
    jpeg.REG_0x34.val = 0x0 # if set output is a JPEG but weird with no footer
    jpeg.CODEC.val = 0

    jpeg.PX_USE_PLANE1.val = 0x0
    jpeg.PX_PLANE0_WIDTH.val = W*4 - 1
    jpeg.PX_PLANE0_HEIGHT.val = H - 1
    jpeg.PX_PLANE1_WIDTH.val = 0xffffffff
    jpeg.PX_PLANE1_HEIGHT.val = 0xffffffff
    jpeg.TIMEOUT.val = 0xfdad680

    jpeg.PX_TILES_W.val = divroundup(W, 8)
    jpeg.PX_TILES_H.val = divroundup(H, 8)
    jpeg.PX_PLANE0_TILING_H.val = 0x4
    jpeg.PX_PLANE0_TILING_V.val = 0x8
    jpeg.PX_PLANE0_STRIDE.val = W*4
    jpeg.PX_PLANE1_STRIDE.val = 0

    # none of this seems to affect anything????
    jpeg.REG_0x94.val = 0xc     # c/2 for 444; 8/2 for 422; 3/1 for 411; b/2 for 400
    jpeg.REG_0x98.val = 0x2
    jpeg.REG_0x20c.val = W
    jpeg.REG_0x210.val = H

    jpeg.CONVERT_COLOR_SPACE.val = 1
    jpeg.MATRIX_MULT[0].val = 0x4d
    jpeg.MATRIX_MULT[1].val = 0x96
    jpeg.MATRIX_MULT[2].val = 0x1d
    jpeg.MATRIX_MULT[3].val = 0xffffffd5
    jpeg.MATRIX_MULT[4].val = 0xffffffab
    jpeg.MATRIX_MULT[5].val = 0x80
    jpeg.MATRIX_MULT[6].val = 0x80
    jpeg.MATRIX_MULT[7].val = 0xffffff95
    jpeg.MATRIX_MULT[8].val = 0xffffffeb
    jpeg.MATRIX_MULT[9].val = 0x0
    jpeg.MATRIX_MULT[10].val = 0x80
    # jpeg.MATRIX_MULT[0].val = 0x80
    # jpeg.MATRIX_MULT[1].val = 0
    # jpeg.MATRIX_MULT[2].val = 0
    # jpeg.MATRIX_MULT[3].val = 0
    # jpeg.MATRIX_MULT[4].val = 0x80
    # jpeg.MATRIX_MULT[5].val = 0
    # jpeg.MATRIX_MULT[6].val = 0
    # jpeg.MATRIX_MULT[7].val = 0
    # jpeg.MATRIX_MULT[8].val = 0x80
    # jpeg.MATRIX_MULT[9].val = 0
    # jpeg.MATRIX_MULT[10].val = 0x80

    jpeg.ENCODE_PIXEL_FORMAT.val = 2
    jpeg.ENCODE_COMPONENT0_POS.val = 0
    jpeg.ENCODE_COMPONENT1_POS.val = 1
    jpeg.ENCODE_COMPONENT2_POS.val = 2
    jpeg.ENCODE_COMPONENT3_POS.val = 3

    jpeg.INPUT_START1 = input_iova
    jpeg.INPUT_START2 = input_iova
    jpeg.INPUT_END = input_iova + input_sz + 7 # NOTE +7
    jpeg.OUTPUT_START1 = output_iova
    jpeg.OUTPUT_START2 = 0xdeadbeef
    jpeg.OUTPUT_END = output_iova + output_sz

    jpeg.REG_0x118.val = 0x1
    jpeg.REG_0x11c.val = 0x0

    jpeg.ENABLE_RST_LOGGING = 1

    jpeg.MODE.val = 0x16f
    jpeg.JPEG_IO_FLAGS.val = 0x30
    jpeg.JPEG_WIDTH.val = W
    jpeg.JPEG_HEIGHT.val = H
    jpeg.RST_INTERVAL.val = 0
    jpeg.JPEG_OUTPUT_FLAGS.val = 0


    jpeg.QTBL[0].val =  0xa06e64a0
    jpeg.QTBL[1].val =  0xf0ffffff
    jpeg.QTBL[2].val =  0x78788cbe
    jpeg.QTBL[3].val =  0xffffffff
    jpeg.QTBL[4].val =  0x8c82a0f0
    jpeg.QTBL[5].val =  0xffffffff
    jpeg.QTBL[6].val =  0x8caadcff
    jpeg.QTBL[7].val =  0xffffffff
    jpeg.QTBL[8].val =  0xb4dcffff
    jpeg.QTBL[9].val =  0xffffffff
    jpeg.QTBL[10].val = 0xf0ffffff
    jpeg.QTBL[11].val = 0xffffffff
    jpeg.QTBL[12].val = 0xffffffff
    jpeg.QTBL[13].val = 0xffffffff
    jpeg.QTBL[14].val = 0xffffffff
    jpeg.QTBL[15].val = 0xffffffff

    jpeg.QTBL[16].val = 0xaab4f0ff
    jpeg.QTBL[17].val = 0xffffffff
    jpeg.QTBL[18].val = 0xb4d2ffff
    jpeg.QTBL[19].val = 0xffffffff
    jpeg.QTBL[20].val = 0xf0ffffff
    jpeg.QTBL[21].val = 0xffffffff
    jpeg.QTBL[22].val = 0xffffffff
    jpeg.QTBL[23].val = 0xffffffff
    jpeg.QTBL[24].val = 0xffffffff
    jpeg.QTBL[25].val = 0xffffffff
    jpeg.QTBL[26].val = 0xffffffff
    jpeg.QTBL[27].val = 0xffffffff
    jpeg.QTBL[28].val = 0xffffffff
    jpeg.QTBL[29].val = 0xffffffff
    jpeg.QTBL[30].val = 0xffffffff
    jpeg.QTBL[31].val = 0xffffffff

    jpeg.QTBL[32].val = 0x01010201
    jpeg.QTBL[33].val = 0x01020202
    jpeg.QTBL[34].val = 0x02030202
    jpeg.QTBL[35].val = 0x03030604
    jpeg.QTBL[36].val = 0x03030303
    jpeg.QTBL[37].val = 0x07050804
    jpeg.QTBL[38].val = 0x0608080a
    jpeg.QTBL[39].val = 0x0908070b
    jpeg.QTBL[40].val = 0x080a0e0d
    jpeg.QTBL[41].val = 0x0b0a0a0c
    jpeg.QTBL[42].val = 0x0a08080b
    jpeg.QTBL[43].val = 0x100c0c0d
    jpeg.QTBL[44].val = 0x0f0f0f0f
    jpeg.QTBL[45].val = 0x090b1011
    jpeg.QTBL[46].val = 0x0f0e110d
    jpeg.QTBL[47].val = 0x0e0e0e01

    jpeg.QTBL[48].val = 0x04040405
    jpeg.QTBL[49].val = 0x04050905
    jpeg.QTBL[50].val = 0x05090f0a
    jpeg.QTBL[51].val = 0x080a0f1a
    jpeg.QTBL[52].val = 0x13090913
    jpeg.QTBL[53].val = 0x1a1a1a1a
    jpeg.QTBL[54].val = 0x0d1a1a1a
    jpeg.QTBL[55].val = 0x1a1a1a1a
    jpeg.QTBL[56].val = 0x1a1a1a1a
    jpeg.QTBL[57].val = 0x1a1a1a1a
    jpeg.QTBL[58].val = 0x1a1a1a1a
    jpeg.QTBL[59].val = 0x1a1a1a1a
    jpeg.QTBL[60].val = 0x1a1a1a1a
    jpeg.QTBL[61].val = 0x1a1a1a1a
    jpeg.QTBL[62].val = 0x1a1a1a1a
    jpeg.QTBL[63].val = 0x1a1a1a1a

    jpeg.HUFFMAN_TABLE.val = 0x3c
    jpeg.QTBL_SEL.val = 0xff
    jpeg.REG_0x0.val = 0x1
    jpeg.REG_0x1004.val = 0x1


print(f"HW revision is {jpeg.HWREV}")
reset_block()

##### THIS FOR ENCODE
input_buf_phys = u.heap.memalign(0x4000, input_sz_aligned)
output_buf_phys = u.heap.memalign(0x4000, output_sz_aligned)
print(f"buffers (phys) {input_buf_phys:016X} {output_buf_phys:016X}")

input_buf_iova = dart.iomap(0, input_buf_phys, input_sz_aligned)
output_buf_iova = dart.iomap(0, output_buf_phys, output_sz_aligned)
print(f"buffers (iova) {input_buf_iova:08X} {output_buf_iova:08X}")
dart.dump_all()

iface.writemem(input_buf_phys, image_data + b'\xAA' * (input_sz_aligned - len(image_data)))
iface.writemem(output_buf_phys, b'\xAA' * output_sz_aligned)

encode(input_buf_iova, input_sz_aligned, output_buf_iova, output_sz_aligned)

time.sleep(1)

print(jpeg.STATUS.reg)
print(jpeg.PERFCOUNTER.reg)
print(jpeg.COMPRESSED_BYTES.reg)

# print(jpeg.RST_LOG_ENTRIES.val)
# print(jpeg.RSTLOG[0].val)
# print(jpeg.RSTLOG[1].val)
# print(jpeg.RSTLOG[2].val)
# print(jpeg.RSTLOG[3].val)

output_data = iface.readmem(output_buf_phys, output_sz_aligned)

with open('jpegblockout.bin', 'wb') as f:
    f.write(output_data)

# #### THIS FOR DECODE
# input_buf_phys = u.heap.memalign(0x4000, jpeg_sz_aligned)
# output_buf_phys = u.heap.memalign(0x4000, output_img_sz_aligned)
# print(f"buffers (phys) {input_buf_phys:016X} {output_buf_phys:016X}")

# input_buf_iova = dart.iomap(0, input_buf_phys, jpeg_sz_aligned)
# output_buf_iova = dart.iomap(0, output_buf_phys, output_img_sz_aligned)
# print(f"buffers (iova) {input_buf_iova:08X} {output_buf_iova:08X}")
# dart.dump_all()

# # jpeg_data += b'\xaa' * (8 - (len(jpeg_data) % 8))
# # assert len(jpeg_data) % 8 == 0
# # jpeg_data_2 = b''
# # for i in range(len(jpeg_data) // 8):
# #     jpeg_data_2 += jpeg_data[i*8:(i+1)*8][::-1]
# # jpeg_data = jpeg_data_2

# iface.writemem(input_buf_phys, jpeg_data)
# iface.writemem(output_buf_phys, b'\xAA' * output_img_sz_aligned)
# print("JPEG uploaded")

# decode(input_buf_iova, jpeg_sz_aligned, output_buf_iova, output_img_sz_aligned)

# time.sleep(1)

# print(jpeg.STATUS.reg)
# print(jpeg.PERFCOUNTER.reg)

# output_data = iface.readmem(output_buf_phys, output_img_sz_aligned)
# with open('testtest.bin', 'wb') as f:
#     f.write(output_data)

# img = Image.new(mode='RGBA', size=(output_W * 4, output_H * 4))
# draw = ImageDraw.Draw(img)
# output_elemsz = 4
# output_stride = output_W * 4
# for y in range(output_H):
#     for x in range(output_W):
#         block = output_data[y*output_stride + x*output_elemsz:y*output_stride + (x+1)*output_elemsz]

#         r, g, b, a = block
#         # val = struct.unpack("<H", block)[0]
#         # r = (val >> 11 & 0x1F) * 8
#         # g = (val >> 5 & 0x3F) * 7
#         # b = (val & 0x1F) * 8
#         # a = 255

#         # img.putpixel((x, y), (r, g, b, a))
#         draw.rectangle((x*4,y*4,(x+1)*4,(y+1)*4), fill=(r, g, b, a))


#         # cb, y0, cr, y1 = block

#         # y0 -= 16
#         # y1 -= 16
#         # cb -= 128
#         # cr -= 128

#         # cb /= 255
#         # y0 /= 255
#         # cr /= 255
#         # y1 /= 255

#         # r0 = y0 + 1.13983 * cr
#         # g0 = y0 - 0.39465 * cb - 0.58060 * cr
#         # b0 = y0 + 2.03211 * cb
#         # r1 = y1 + 1.13983 * cr
#         # g1 = y1 - 0.39465 * cb - 0.58060 * cr
#         # b1 = y1 + 2.03211 * cb

#         # r0 = min(255, max(0, int(r0 * 255)))
#         # g0 = min(255, max(0, int(g0 * 255)))
#         # b0 = min(255, max(0, int(b0 * 255)))
#         # r1 = min(255, max(0, int(r1 * 255)))
#         # g1 = min(255, max(0, int(g1 * 255)))
#         # b1 = min(255, max(0, int(b1 * 255)))

#         # draw.rectangle((x*2*4,y*4,(x*2+1)*4,(y+1)*4), fill=(r0, g0, b0, 255))
#         # draw.rectangle(((x*2+1)*4,y*4,(x*2+1+1)*4,(y+1)*4), fill=(r1, g1, b1, 255))


# for y in range(output_H // 4):
#     for x in range(output_W // 4):
#         draw.rectangle((x*4*8,y*4*8,(x+1)*4*8,(y+1)*4*8), outline=(0, 0, 0, 255))
# img.save('testtest.png')
# # img.show()