Skip to content

Instantly share code, notes, and snippets.

@mildsunrise
Last active February 13, 2025 09:27
Show Gist options
  • Save mildsunrise/1d576669b63a260d2cff35fda63ec0b5 to your computer and use it in GitHub Desktop.
Save mildsunrise/1d576669b63a260d2cff35fda63ec0b5 to your computer and use it in GitHub Desktop.
Documentation of Tuya's weird compression scheme for IR codes

Tuya's IR blasters, like the ZS08, have the ability to both learn and blast generic IR codes. These IR codes are given to the user as an opaque string, like this:

A/IEiwFAAwbJAfIE8gSLIAUBiwFAC+ADAwuLAfIE8gSLAckBRx9AB0ADBskB8gTyBIsgBQGLAUALA4sB8gRAB8ADBfIEiwHJAeARLwHJAeAFAwHyBOC5LwGLAeA97wOLAfIE4RcfBYsB8gTyBEAFAYsB4AcrCYsB8gTyBIsByQHgPY8DyQHyBOAHAwHyBEAX4BVfBIsB8gTJoAMF8gSLAckB4BUvAckB4AEDBfIEiwHJAQ==

Not much is known about the format of these IR code strings, which makes it difficult to use codes obtained through other means (such as a manual implementation of the IR protocol for a particular device, or public Internet code tables) with these blasters, as well as to use codes learnt through these blasters with other brands of blasters and study their contents.

So far I've only been able to find one person who dug into this before me, who was able to understand it enough to create their own codes to blast, but not enough to understand codes learnt by the device.

This document attempts to fully document the format and also provides a (hopefully) working Python implementation.

Overview

There is no standard for IR codes, so appliances use different methods to encode the data into an IR signal, often called "IR protocols". A popular one, which could be considered an unofficial standard, is the NEC protocol. NEC specifies a way to encode 16 bits as a series of pulses of modulated IR light, but it's just one protocol.

Tuya's IR blasters are meant to be generic and work with just about any protocol. To do that, they work at a lower level and record the IR signal directly instead of detecting a particular protocol and decoding the bits. In particular, the blaster records a binary signal like this one:

  +------+     +----------+  +-+
  |      |     |          |  | |
--+      +-----+          +--+ +---

Such a signal can be represented by noting the times at which the signal flips from low to high and viceversa. It is better to record the differences of these times as they will be smaller numbers. For example, the above signal is represented as:

[7, 6, 11, 3, 2]

Meaning, the signal stays high for 7 units of time, then low for 6 units of time, then high for 11 units, and so on. The first time is always for a high state, which means even times (2nd, 4th, 6th...) are always low periods while odd times (1st, 3rd, 5th...) are always high periods.

The blaster takes these numbers (in units of microseconds) and encodes each of them as a little-endian 16-bit integer, resulting in the following 10 bytes:

07 00 06 00 0B 00 03 00 02 00

Because we're recording a signal rather than high-level protocol data, this results in very long messages in real life. So, the blaster compresses these bytes using a weird algorithm (see below), and then encodes the resulting bytes using base64 so the user can copy/paste the code easily.

Compression scheme

Update: Turns out this is FastLZ compression. No need to read this section, you can go to their website instead.

I was unable to find a public algorithm that matched this, so I'm assuming it's a custom lossless compression algorithm that a random Tuya employee hacked to make my life more complicated. Jokes aside it seems to be doing a very poor job, and if I were them I would've just used Huffman coding or something.

Anyway, the algorithm is LZ77-based, with a fixed 8kB window. The stream contains a series of blocks. Each block begins with a "header byte", and the 3 MSBs of this byte determine the type of block:

  • If the 3 bits are zero, then this is a literal block and the other 5 bits specify a length L minus one.

    Upon encountering this block, the decoder consumes the next L bytes from the stream and emits them as output.

    +---------+-----------------------------+
    |000LLLLLL| 1..32 bytes, depending on L |
    +---------+-----------------------------+
    
  • If the 3 bits have any other value, then this is a length-distance pair block; the 3 bits specify a length L minus 2, and the concatenation of the other 5 bits with the next byte specifies a distance D minus 1.

    Upon encountering this block, the decoder copies L bytes from the previous output. It begins copying D bytes before the output cursor, so if D = 1, the first copied byte is the most recently emitted byte; if D = 2, the byte before that one, and so on.

    As usual, it may happen that L > D, in which case the output repeats as necessary (for example if L = 5 and D = 2, and the 2 last emitted bytes are X and Y, the decoder would emit XYXYX).

    +--------+--------+
    |LLLDDDDD|DDDDDDDD|
    +--------+--------+
    

    As a special case, if the 3 bits are one, then there's an extra byte preceding the distance byte, which specifies a value to be added to L:

    +--------+--------+--------+
    |111DDDDD|LLLLLLLL|DDDDDDDD|
    +--------+--------+--------+
    
import io
import base64
from bisect import bisect
from struct import pack, unpack
# MAIN API
def decode_ir(code: str) -> list[int]:
'''
Decodes an IR code string from a Tuya blaster.
Returns the IR signal as a list of µs durations,
with the first duration belonging to a high state.
'''
payload = base64.decodebytes(code.encode('ascii'))
payload = decompress(io.BytesIO(payload))
signal = []
while payload:
assert len(payload) >= 2, \
f'garbage in decompressed payload: {payload.hex()}'
signal.append(unpack('<H', payload[:2])[0])
payload = payload[2:]
return signal
def encode_ir(signal: list[int], compression_level=2) -> str:
'''
Encodes an IR signal (see `decode_tuya_ir`)
into an IR code string for a Tuya blaster.
'''
payload = b''.join(pack('<H', t) for t in signal)
compress(out := io.BytesIO(), payload, compression_level)
payload = out.getvalue()
return base64.encodebytes(payload).decode('ascii').replace('\n', '')
# DECOMPRESSION
def decompress(inf: io.FileIO) -> bytes:
'''
Reads a "Tuya stream" from a binary file,
and returns the decompressed byte string.
'''
out = bytearray()
while (header := inf.read(1)):
L, D = header[0] >> 5, header[0] & 0b11111
if not L:
# literal block
L = D + 1
data = inf.read(L)
assert len(data) == L
else:
# length-distance pair block
if L == 7:
L += inf.read(1)[0]
L += 2
D = (D << 8 | inf.read(1)[0]) + 1
data = bytearray()
while len(data) < L:
data.extend(out[-D:][:L-len(data)])
out.extend(data)
return bytes(out)
# COMPRESSION
def emit_literal_blocks(out: io.FileIO, data: bytes):
for i in range(0, len(data), 32):
emit_literal_block(out, data[i:i+32])
def emit_literal_block(out: io.FileIO, data: bytes):
length = len(data) - 1
assert 0 <= length < (1 << 5)
out.write(bytes([length]))
out.write(data)
def emit_distance_block(out: io.FileIO, length: int, distance: int):
distance -= 1
assert 0 <= distance < (1 << 13)
length -= 2
assert length > 0
block = bytearray()
if length >= 7:
assert length - 7 < (1 << 8)
block.append(length - 7)
length = 7
block.insert(0, length << 5 | distance >> 8)
block.append(distance & 0xFF)
out.write(block)
def compress(out: io.FileIO, data: bytes, level=2):
'''
Takes a byte string and outputs a compressed "Tuya stream".
Implemented compression levels:
0 - copy over (no compression, 3.1% overhead)
1 - eagerly use first length-distance pair found (linear)
2 - eagerly use best length-distance pair found
3 - optimal compression (n^3)
'''
if level == 0:
return emit_literal_blocks(out, data)
W = 2**13 # window size
L = 255+9 # maximum length
distance_candidates = lambda: range(1, min(pos, W) + 1)
def find_length_for_distance(start: int) -> int:
length = 0
limit = min(L, len(data) - pos)
while length < limit and data[pos + length] == data[start + length]:
length += 1
return length
find_length_candidates = lambda: \
( (find_length_for_distance(pos - d), d) for d in distance_candidates() )
find_length_cheap = lambda: \
next((c for c in find_length_candidates() if c[0] >= 3), None)
find_length_max = lambda: \
max(find_length_candidates(), key=lambda c: (c[0], -c[1]), default=None)
if level >= 2:
suffixes = []; next_pos = 0
key = lambda n: data[n:]
find_idx = lambda n: bisect(suffixes, key(n), key=key)
def distance_candidates():
nonlocal next_pos
while next_pos <= pos:
if len(suffixes) == W:
suffixes.pop(find_idx(next_pos - W))
suffixes.insert(idx := find_idx(next_pos), next_pos)
next_pos += 1
idxs = (idx+i for i in (+1,-1)) # try +1 first
return (pos - suffixes[i] for i in idxs if 0 <= i < len(suffixes))
if level <= 2:
find_length = { 1: find_length_cheap, 2: find_length_max }[level]
block_start = pos = 0
while pos < len(data):
if (c := find_length()) and c[0] >= 3:
emit_literal_blocks(out, data[block_start:pos])
emit_distance_block(out, c[0], c[1])
pos += c[0]
block_start = pos
else:
pos += 1
emit_literal_blocks(out, data[block_start:pos])
return
# use topological sort to find shortest path
predecessors = [(0, None, None)] + [None] * len(data)
def put_edge(cost, length, distance):
npos = pos + length
cost += predecessors[pos][0]
current = predecessors[npos]
if not current or cost < current[0]:
predecessors[npos] = cost, length, distance
for pos in range(len(data)):
if c := find_length_max():
for l in range(3, c[0] + 1):
put_edge(2 if l < 9 else 3, l, c[1])
for l in range(1, min(32, len(data) - pos) + 1):
put_edge(1 + l, l, 0)
# reconstruct path, emit blocks
blocks = []; pos = len(data)
while pos > 0:
_, length, distance = predecessors[pos]
pos -= length
blocks.append((pos, length, distance))
for pos, length, distance in reversed(blocks):
if not distance:
emit_literal_block(out, data[pos:pos + length])
else:
emit_distance_block(out, length, distance)
@leah-potato
Copy link

Fantastic write up! Huge thanks @mildsunrise

@vills
Copy link

vills commented Aug 12, 2024

@mildsunrise really appreciate your work!

@svyatogor How did you get on with converting the codes from the SmartIR package to Tuya compatible ones. Once you had them converted how are you sending the codes to your IR Blaster? Are you using TinyTuya or something like LocalTuya in home assistant? And if so could you provide more details on how you configured things to work?

@svyatogor i combined definitions from that gist and other repo to convert Broadlink base64 encoded codes to Tuya - https://gist.github.com/vills/590c154b377ac50acab079328e4ddaf9

@magicus
Copy link

magicus commented Oct 13, 2024

Thank you @mildsunrise for your work on this! I am trying to understand the code that my ZS06 is giving me; mostly out of curiosity -- for practical purposes I can just take the code as learned by the ZS06 and send it back and it will emulate the original remote control. But it annoys me that I don't fully understand this. :-)

The source of my confusion is that each time I try to learn the very same keypress, I get a different string back. With your code in this gist, I am able to decode it to an array of ints, but that only mean I now can see how the values are different. Let me take a simple example. These strings are from me pressing the same key on an IR:

BVEjrBEmAkABAboG4AEDQAHAD0ABQAsCWgIm4AABQA/AAUALwANAAUAL4A8BwBtAB+ADAwGqneANh0BzwCdAAUALwAHAG8ABQBvAA0ABQAvgDwHAG0AHC7oGJgK6BiYCugYmAg==

BUQjtBEkAkABAb0G4AEDA1oCJALAD0ABQAvAAcAbQAHAC0AfQANAAUAHQBfgAwHAD0AbQAFAB+ADAwGMneANh0ABwCdAAUAL4AMB4AMPQAvAA0ABQAvAAcB/wAHAG0AHC70GJAK9BiQCvQYkAg==

BUkjwREoAkABAbcG4AEDA18CKALAD0ABQAvgAwHAD+ADJ8ATwAfgCwHAG0AH4AMDAZmd4A2HQAHAJ0ABQAvgAwHgAw9AC8ADQAFAC+APAcAbQAcLtwYoArcGKAK3BigC

BUojsBEeAkABAcMG4AEDA2ACHgLAD0ABQAtAE8ABQA/AAUALwANAAUAL4AMrQAHAD0AbQAFAB+ADAwHAneAhh8ABQFNAQ0ABwAtAD0ADQAHAB0ABQB9AAcAHwBtABwvDBh4CwwYeAsMGHgI=

BUYjsBEkAkABAb8G4AEDQAHAD0ABQAvAAQNeAiQC4AMPQAvAA0ABQAvgDwHAG0AH4AMDAYOd4CmHQAFAQ8ABQAvAA0ABQAtAq0ABwAfAAcAbQAcLvwYkAr8GJAK/BiQC

Decoded, they correspond to the following integer arrays (in the same order as the strings above):

[9041, 4524, 550, 550, 550, 1722, 550, 1722, 550, 1722, 550, 550, 550, 1722, 550, 1722, 550, 550, 550, 1722, 550, 602, 550, 550, 550, 550, 550, 1722, 550, 550, 550, 550, 550, 1722, 550, 1722, 550, 1722, 550, 550, 550, 1722, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 1722, 550, 550, 550, 1722, 550, 1722, 550, 1722, 550, 1722, 550, 40362, 9041, 4524, 550, 550, 550, 1722, 550, 1722, 550, 1722, 550, 602, 550, 1722, 550, 1722, 550, 550, 550, 1722, 550, 550, 550, 550, 550, 602, 550, 1722, 550, 550, 550, 550, 550, 1722, 550, 1722, 550, 1722, 550, 550, 550, 1722, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 550, 1722, 550, 550, 550, 1722, 550, 1722, 550, 1722, 550, 1722, 550]

[9028, 4532, 548, 548, 548, 1725, 548, 1725, 548, 1725, 548, 602, 548, 1725, 548, 1725, 548, 548, 548, 1725, 548, 548, 548, 548, 548, 602, 548, 1725, 548, 548, 548, 602, 548, 1725, 548, 1725, 548, 1725, 548, 548, 548, 1725, 548, 602, 548, 548, 548, 548, 548, 548, 548, 602, 548, 548, 548, 1725, 548, 548, 548, 1725, 548, 1725, 548, 1725, 548, 1725, 548, 40332, 9028, 4532, 548, 548, 548, 1725, 548, 1725, 548, 1725, 548, 548, 548, 1725, 548, 1725, 548, 548, 548, 1725, 548, 548, 548, 548, 548, 548, 548, 1725, 548, 548, 548, 548, 548, 1725, 548, 1725, 548, 1725, 548, 548, 548, 1725, 548, 548, 548, 548, 548, 602, 548, 548, 548, 548, 548, 548, 548, 1725, 548, 548, 548, 1725, 548, 1725, 548, 1725, 548, 1725, 548]

[9033, 4545, 552, 552, 552, 1719, 552, 1719, 552, 1719, 552, 607, 552, 1719, 552, 1719, 552, 552, 552, 1719, 552, 552, 552, 552, 552, 552, 552, 1719, 552, 552, 552, 607, 552, 1719, 552, 1719, 552, 1719, 552, 552, 552, 1719, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 1719, 552, 552, 552, 1719, 552, 1719, 552, 1719, 552, 1719, 552, 40345, 9033, 4545, 552, 552, 552, 1719, 552, 1719, 552, 1719, 552, 552, 552, 1719, 552, 1719, 552, 552, 552, 1719, 552, 552, 552, 552, 552, 552, 552, 1719, 552, 552, 552, 552, 552, 1719, 552, 1719, 552, 1719, 552, 552, 552, 1719, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 552, 1719, 552, 552, 552, 1719, 552, 1719, 552, 1719, 552, 1719, 552]

[9034, 4528, 542, 542, 542, 1731, 542, 1731, 542, 1731, 542, 608, 542, 1731, 542, 1731, 542, 542, 542, 1731, 542, 608, 542, 542, 542, 542, 542, 1731, 542, 542, 542, 542, 542, 1731, 542, 1731, 542, 1731, 542, 542, 542, 1731, 542, 608, 542, 542, 542, 542, 542, 542, 542, 608, 542, 542, 542, 1731, 542, 542, 542, 1731, 542, 1731, 542, 1731, 542, 1731, 542, 40384, 9034, 4528, 542, 542, 542, 1731, 542, 1731, 542, 1731, 542, 608, 542, 1731, 542, 1731, 542, 542, 542, 1731, 542, 542, 542, 542, 542, 608, 542, 1731, 542, 542, 542, 608, 542, 1731, 542, 1731, 542, 1731, 542, 542, 542, 1731, 542, 542, 542, 542, 542, 608, 542, 542, 542, 608, 542, 542, 542, 1731, 542, 542, 542, 1731, 542, 1731, 542, 1731, 542, 1731, 542]

[9030, 4528, 548, 548, 548, 1727, 548, 1727, 548, 1727, 548, 548, 548, 1727, 548, 1727, 548, 548, 548, 1727, 548, 548, 548, 548, 548, 606, 548, 1727, 548, 548, 548, 548, 548, 1727, 548, 1727, 548, 1727, 548, 548, 548, 1727, 548, 548, 548, 548, 548, 548, 548, 548, 548, 548, 548, 548, 548, 1727, 548, 548, 548, 1727, 548, 1727, 548, 1727, 548, 1727, 548, 40323, 9030, 4528, 548, 548, 548, 1727, 548, 1727, 548, 1727, 548, 548, 548, 1727, 548, 1727, 548, 548, 548, 1727, 548, 548, 548, 548, 548, 548, 548, 1727, 548, 548, 548, 548, 548, 1727, 548, 1727, 548, 1727, 548, 548, 548, 1727, 548, 606, 548, 548, 548, 606, 548, 548, 548, 548, 548, 548, 548, 1727, 548, 606, 548, 1727, 548, 1727, 548, 1727, 548, 1727, 548]

There is an obvious pattern to these arrays, but it is not trivial to understand what it means. The naive interpretation, given your explanation above, would be that the length of each signal inversion would be consistent within each learning attempt, but vary between attempts. That does not seem to be a reasonable conclusion. Instead, I guess the values are somehow influenced by a per-learning based code. I'm still wrapping my head around this and trying to understand what I see.

@mildsunrise
Copy link
Author

mildsunrise commented Oct 13, 2024

the five signals look pretty similar to me. out of curiosity I did the math and the intervals are very consistent, with a standard deviation of less than 10µs for almost all of them. so I'm pretty confident that they are all carrying the exact same data.

you're never going to get the exact same signal captured twice, because unless you're in a lab environment, your medium will have other IR sources introducing noise, which will make the signal vary slightly. cheap remotes may also not have a very stable clock, and their pulses may be slightly longer or shorter each time.

the job of protocols (like NEC) is to encode the data in a way that is resistant to these variations and other adverse effects

@mildsunrise
Copy link
Author

mildsunrise commented Oct 13, 2024

now, if you're interested in decoding the signal to obtain the bits that are transmitted in it, I'm not sure which protocol that's using (I'm not very knowledgeable about IR protocols) but it can somehow be guessed... first, notice that in the middle of the signal there is a pretty big 40ms spacing (i.e. a low period). and it is exactly in the middle of the signal. it's common for remotes to transfer the code multiple times when you press a key (just in case one of the transmissions gets damaged by noise).

and indeed, if we check the signal before the 40ms we see that it matches the signal after the 40ms, so it's pretty safe to assume one is a retransmission of the other. so let's look only at the first retransmission. it starts with a 9ms pulse followed by a 4.5ms space. this is also common, the remote sends a big pulse to indicate that it's about to start transmitting a message.

after the initial pulse and spacing, note how all of the values are either ~1.7ms or ~0.55ms. in particular, the high parts are always 0.55ms while the low parts in between them are either 0.55ms or 1.7ms. so it's safe to say this protocol encodes bits as the spaces between 0.55ms pulses. if we treat 0.55ms spaces as a 0 and 1.7ms spaces as a 1, then your signals carry the following data: 01110110100010011101000000101111

I encourage you to do this decoding for all of the keys in your remote, and you may be able to guess what each bit means! you can also experiment by constructing your own messages, sending them to your appliance and seeing what that does. but be aware that some of the bits may be control bits to detect damaged transmissions, and changing some of the bits without recalculating the control bits as appropriate may make your appliance reject the message.

@magicus
Copy link

magicus commented Oct 14, 2024

Thanks for your detailed reply! I noticed the rough pattern, where all entries were about an integer multiple of the lowest (550) value (16, 8, 4 and 3 times this value), but what really throw me off was how these values varied, but not during a single keypress. My thinking was that if there is some imprecision, the values would have been more like 550, 554, 548, 548, 551 in a single transmission, not that all of them would be 550 and all of the one in the next one would be 548, etc.

But maybe as you say, if the IR transmitter has an imprecise clock, this is what we can get -- for a while it runs a bit slower and then it runs a bit faster.

I'll check around on the documentation of IR protocols and see if this seem to match a known pattern.

And I'll try to write a "cleaner" script which converts the values to the best match of known intervals; while I have had success in replicating the remote functionality with any of the patterns that I have tested, it would probably make sense to have the blaster send a code that is as close as possible to the value the remote is supposed to send. Also, it would look a lot more tidy in my Home Assistant rules if I could encode a key press as 01110110100010011101000000101111 rather than BUYjsBEkAkABAb8G4AEDQAHAD0ABQAvAAQNeAiQC4AMPQAvAA0ABQAvgDwHAG0AH4AMDAYOd4CmHQAFAQ8ABQAvAA0ABQAtAq0ABwAfAAcAbQAcLvwYkAr8GJAK/BiQC. :)

@magicus
Copy link

magicus commented Oct 14, 2024

This seems to be the common NEC encoding (https://techdocs.altium.com/display/FPGA/NEC+Infrared+Transmission+Protocol). Interestingly, it should formally have a 562.5µs pulse, which cannot be properly described by the integer values of the Tuya encoding. I guess there are several µs tolerances involved in IR devices, so that a 0.5 µs difference is irrelevant.

@magicus
Copy link

magicus commented Oct 14, 2024

FTR, here is a reconstructed interpretation of what the key press "should" have looked like, using official NEC timings:

[
// preamble
9000, 4500, 

// address byte
560, 560, 
560, 1690, 
560, 1690, 
560, 1690, 
560, 560, 
560, 1690, 
560, 1690, 
560, 560, 
// address byte, inverted
560, 1690, 
560, 560, 
560, 560, 
560, 560, 
560, 1690, 
560, 560, 
560, 560, 
560, 1690, 
// command byte
560, 1690, 
560, 1690, 
560, 560, 
560, 1690, 
560, 560, 
560, 560, 
560, 560, 
560, 560, 
// command byte, inverted
560, 560, 
560, 560, 
560, 1690, 
560, 560, 
560, 1690, 
560, 1690, 
560, 1690, 
560, 1690, 
// trailer
560]

which corresponds to exactly the bit pattern you mentioned.

@mildsunrise
Copy link
Author

I noticed the rough pattern, where all entries were about an integer multiple of the lowest (550) value (16, 8, 4 and 3 times this value), but what really throw me off was how these values varied, but not during a single keypress. My thinking was that if there is some imprecision, the values would have been more like 550, 554, 548, 548, 551 in a single transmission, not that all of them would be 550 and all of the one in the next one would be 548, etc.

ahh, I'm pretty sure that's the Tuya heavily quantizing the values so that they compress better (since they are using lossless compression, which is based on repetition). if you want to get a more precise signal description, you'll have to use an Arduino or (if you don't want it to even be binarized) an oscilloscope connected to an IR sensor

I guess there are several µs tolerances involved in IR devices, so that a 0.5 µs difference is irrelevant.

oh yeah, tolerances are much much bigger than just a few µs, otherwise the quantization would have to be much less aggressive

while I have had success in replicating the remote functionality with any of the patterns that I have tested, it would probably make sense to have the blaster send a code that is as close as possible to the value the remote is supposed to send

that's true, but don't worry too much about it, the tolerances are very lax

Also, it would look a lot more tidy in my Home Assistant rules if I could encode a key press as 01110110100010011101000000101111 rather than BUYjsBEkAkABAb8G4AEDQAHAD0ABQAvAAQNeAiQC4AMPQAvAA0ABQAvgDwHAG0AH4AMDAYOd4CmHQAFAQ8ABQAvAA0ABQAtAq0ABwAfAAcAbQAcLvwYkAr8GJAK/BiQC. :)

yeah, that's IMO the best reason for encoding your own signals. especially for more complicated appliances like HVACs, where the transmissions encode complicated configurations like temperature, timer settings and so on, which would be a nightmare to store one by one

@mildsunrise
Copy link
Author

also, happy to see that it's standard NEC :)

@Bazoogle
Copy link

Great write up! I have been trying to figure out the compression method for several days, and I couldn't do it. I wish I had found this a long time ago. Your explanation was very good, better than the official documentation of the FastLZ compression. How long did it take you to figure out the compression method, and how did you get set on the right path? I also think it's impressive you figured it out without understanding the IR protocols, because that was my starting point.

@mildsunrise
Copy link
Author

mildsunrise commented Jan 2, 2025

thanks! it was lots of trial and error I'm afraid :) knowing that the compressed data was a list of u16le helped rule many hypotheses out

@tonipetrovic
Copy link

tonipetrovic commented Jan 3, 2025

Dear @mildsunrise,

I have Panasonic AC and just found that that one guy reverse engineered the IR codes here: https://www.instructables.com/Reverse-engineering-of-an-Air-Conditioning-control/. Is it possible to somehow use this logic and generate the Tuya IR code on the fly? Could you be so kind and help me with it? I have this IR blaster: https://www.zigbee2mqtt.io/devices/UFO-R11.html#moes-ufo-r11.

Thank you and kind regards,
Toni

@burkminipup
Copy link

Wow, amazing work!

I guess I am a little confused. Is this code for creating a Tuya ZS08 IR code (I use the UFO-11 -- battery version) from a community online NEC code, or merely for compressing the IR signal that was derived from using [andrewcchen/tuya_ir_encode.js] code?

Do you have a specific database you would suggest or have successfully tested from? I have been running the scripts for about 6hrs, but without much success on various IR codes/equipment. I have tried github.com/probonopd/irdb codes database, as well as tried manipulating the commercial codes from /irdb.globalcache.com/ (It is a little better organized to find correctly identified equipment).

If you are using github.com/probonopd/irdb, what would be the best way to format the NEC code from the CSV for the script? Example:
root@debian-ir-hex:~/IRDB/irdb/codes/Samsung/TV# cat 7,7.csv
functionname,protocol,device,subdevice,function
INPUT SOURCE,NECx2,7,7,1

This is the kind of string output I am getting, but this doesn't seem to be formatted for the Tuya device (or at least the signal is not sending correctly):

7,7.csv
Function: INPUT SOURCE
NEC Code: 07F801FE
Tuya IR Code: KCOUETACMAIwAjACMAIwAjACMAIwAjACMAKaBjACmgYwApoGMAKaBjACmgYwApoGMAKaBjACmgYwAjACMAIwAjACMAIwAjACMAIwAjACMAIwAjACMAIwAjACMAIwAjACMAKaBjACmgYwApoGMAKaBjACmgYwApoGMAKaBjACmgYwAjACMAI=

This is an example random previously learned button (UFO-11 connected to Zigbee2MQTT):

DBQjABIXArIGFwJeAhcgAQBeIAEEFwKyBl4gAwIXAl4gAQAXIAFABUAPArIGXiADARcCwAFAC0ADBBcCXgIXIAGACwEXAoAN4AEFQBcBsgaAD0AHAbIGQAfgAQMHRpwUI/MIFwI=

Thanks for any feedback. Much appreciated!
-Daniel

@Bazoogle
Copy link

Is this code for creating a Tuya ZS08 IR code (I use the UFO-11 -- battery version) from a community online NEC code, or merely for compressing the IR signal that was derived from using [andrewcchen/tuya_ir_encode.js] code?

This script is used for generating the input "code" you feed the Zigbee IR blaster with a clister command. The code is generated from the raw timing data that an actual remote sends. Previously, you had to learn the code from the device, you couldn't just generate it. Look up the "NEC" protocol to see how the raw timing is generated, and you'll fully understand how it works. You can also use a tool like IR Scrutinizer to convert the NEC Device, Subdevice, and Command into the raw timing data. You should end up with an array with on and off timings, like [6500,4500,560,810] which is how many miliseconds the IR light is on, then off, then on... Repeating for each index in the array (as mentioned in the original post). Every protocol generates the raw timing data slightly differently, so to be consistent you have to feed it the actual raw timing info, and not just the NEC codes.

@burkminipup
Copy link

Ah thanks a lot! That makes since. I'm not always sure what I am looking at when I end up with a output code.

I'll dig into IrScrutinizer soon after this. In the meantime, I'm using [irdb.globalcache.com/Home/Database]. So basically what I'm understanding is the HEX code data won't be used at all, and I would just use the raw timing data (Or convert the HEX to raw timing if necessary)?

This is just an example from Global Cache:
function, code1, hexcode1, code2, hexcode2

"ANGLE","sendir,1:1,1,38000,1,69,342,170,21,21,21,21,21,64,21,64,21,64,21,64,21,21,21,21,21,21,21,21,21,64,21,21,21,21,21,21,21,64,21,64,21,21,21,21,21,21,21,64,21,21,21,21,21,21,21,21,21,64,21,64,21,64,21,21,21,64,21,64,21,64,21,64,21,1556,342,85,21,3647","0000 006D 0022 0002 0156 00AA 0015 0015 0015 0015 0015 0040 0015 0040 0015 0040 0015 0040 0015 0015 0015 0015 0015 0015 0015 0015 0015 0040 0015 0015 0015 0015 0015 0015 0015 0040 0015 0040 0015 0015 0015 0015 0015 0015 0015 0040 0015 0015 0015 0015 0015 0015 0015 0015 0015 0040 0015 0040 0015 0040 0015 0015 0015 0040 0015 0040 0015 0040 0015 0040 0015 0614 0156 0055 0015 0E3F",,

I can basically just extract the code1 raw data, and use it:
"342,170,21,21,21,21,21,64,21,64,21,64,21,64,21,21,21,21,21,21,21,21,21,64,21,21,21,21,21,21,21,64,21,64,21,21,21,21,21,21,21,64,21,21,21,21,21,21,21,21,21,64,21,64,21,64,21,21,21,64,21,64,21,64,21,64,21,1556,342,85,21,3647"

I was originally using [www.yamaha.com/ypab/irhex_converter.asp] for HEX conversion from an irdb NEC (I think Yamaha also uses more rounding rather than a bit-for-bit conversion), but I take it that is no longer necessary since I can just convert it directly into a raw timing code from the NEC code -- Now I'll just have to find an efficient way to convert irdb NEC to raw timing if I don't use IrScrutinizer. Any good Linux packages you recommend for this using CLI?

Thanks,
-Daniel

@Bazoogle
Copy link

So basically what I'm understanding is the HEX code data won't be used at all, and I would just use the raw timing data

When you say HEX code data, if you mean the hex format for the NEC device, subdevice, and function, the yes. The hex portion you provided above seems to me like the timing data in decimal, and the timing data again in hex. If you used the above script, it should just be the decimal data in a python list as integers.

Any good Linux packages you recommend for this using CLI?

I used IRGen on Windows and it worked for my purposes, though there may be something better. You can just plug in the values from IRDB for device, subdevice, and function. Though the raw data needs the symbols removed. Add that to a list and feed it into decode_ir and that code should work on the IR blaster.

Something that's worth noting, if you have any timings larger than 65,535 you will need to just set it to 65,535, since each timing is represented with 2 bytes. I had some like that at the end of the commands, but it worked fine just shortening it.

@burkminipup
Copy link

When you say HEX code data, if you mean the hex format for the NEC device, subdevice, and function, the yes. The hex portion you provided above seems to me like the timing data in decimal, and the timing data again in hex. If you used the above script, it should just be the decimal data in a python list as integers.

Yeah, I had an ah-ha moment yesterday when I was overcomplicating the fact that they are just in HEX or decimal notation, with the decimal notation having the encoded timing.

I finally leaned into IrScrutinizer after a several hours of building conversion scripts through CLI, wish I would have looked into it earlier to save a lot of time. I used kdschlosser/pyIRDecoder for the NEC to decimal conversion, but I don't recommend it with modified scripts because I am still getting errors with most formats that aren't NEC since it doesn't play well with irdb formatting and it is not published on pyPI with no install packages available (git-clone was best I could do). However, with NEC it seems to be a lot faster at bulk IR code conversion than IrScrutinizer (assuming it is NEC, or another protocol that plays nice with the irdb CSVs).

I was able to get the script to work with a more repeatable workflow (work in progress):

root@debian-ir-hex:~# brands list Sa
Fetching available brands...
SAB
SABA
Sagem
Salora
Samsung
Samy
Sansonic
Sansui
Sanyo
Satelco
root@debian-ir-hex:~# brands get Sanyo
Downloading all IR codes for brand: Sanyo
Download complete for Sanyo.
root@debian-ir-hex:~# python3 1_prompt_irdb_to_raw.py 

Available Brands:
 - Sanyo
 - Yamaha
 - Samsung
 - BenQ

Enter the brand folder name:
> Sanyo

Available CSV Files:
 - Unknown_B13540/49,-1.csv
 - Unknown_sanyo-tv01/56,-1.csv
 - Unknown_sanyoB13537/49,-1.csv
 - Unknown_A05800/49,-1.csv
 - Unknown_B12628/49,-1.csv
 - Unknown_RC700/56,-1.csv
 - Unknown_RC-105C/54,-1.csv
 - Unknown_Sanyo/56,-1.csv
 - Unknown_Sanyo/49,-1.csv
 - Unknown_VCR/49,-1.csv
 - TV/56,-1.csv
 - Unknown_RB-SL22/60,196.csv
 - Unknown_Sanyo-B13509/49,-1.csv
 - Unknown_Sanyo-JXZB/56,-1.csv
 - Unknown_B01004/49,-1.csv
 - Unknown_RB-DA300/60,-1.csv
 - Unknown_B01007/49,-1.csv
 - Video Projector/48,-1.csv
 - Unknown_TV/56,-1.csv

Enter the CSV file name:
> Unknown_RB-SL22/60,196.csv

Here are known valid protocols in pyIRDecoder:

[OMMITED CODES PROTOCOLS LIST]         

Use automatic protocol(s) (NEC)? [Y/n]: y

[OMMITED EXTRA CODES]

[PROCESSING] Function: KEY_STOP
[SUCCESS] Function: KEY_STOP
Protocol: NEC
Raw Decimal Timing: [9024, 4512, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 40884]

[OMMITED EXTRA CODES]

root@debian-ir-hex:~# python3 2_prompt_raw_to_tuya.py 
Enter 'e' for Encode (Raw Timing) or 'd' for Decode (Tuya IR Code):
> e

Enter the raw IR signal as a comma-separated list (e.g., 9000,4500,560,1690,...):
> 9024, 4512, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 40884

Generated Tuya IR Code:
BUAjoBE0AsABAZwG4AUD4AcB4AcT4AMn4BM74A9LwAMBtJ8=
root@debian-ir-hex:~# 

Worked like a charm first try in Zigbee2MQTT with my UFO-11.

Thanks for all the help!

@magicus
Copy link

magicus commented Jan 13, 2025

@burkminipup What are these scripts brands, 1_prompt_irdb_to_raw.py and 2_prompt_raw_to_tuya.py that you refer to? Something you wrote yourself? If so, can you please publish them? Otherwise, can you please provide a link?

@burkminipup
Copy link

burkminipup commented Jan 14, 2025

Yes they are custom, but they are not refined, and I am taking a break from the universal remote project for a few. If you're still interested I'm on Discord @burkminipup (I'm new to Github, so not sure how code sharing works).

I have 4 python scripts and a script in PATH for "brands" searching/downloading. Two for generating irdb codes to decimal, and 2 specifically for taking those outputs and generating them to Tuya using the script above. As mentioned in my last post, the scripts are only "fully" tested with NEC protocol, but could be re-written for more (I don't have the time to debug all 143 protocols).

The scripts are great for bulk code converting based on file path (you don't have to do it per CSV). They can also bulk convert an entire remote output to Tuya format (only using the specific output format provided through the scripts). The snippet above was the script for a specific remote (or CSV file).

root@proxmox:~# pct enter 125
root@debian-ir-hex:~# ls
1_prompt_irdb_to_raw.py  3_bulk_irdb_to_raw.py	brands	pyIRDecoder
2_prompt_raw_to_tuya.py  4_bulk_raw_to_tuya.py	IRDB	z_Archive

Example for bulk testing when irdb has unknown or not very well documented labels for devices (pretty much a majority on irdb):

root@debian-ir-hex:~# grep -ri --include="*.csv" "setup" /root/IRDB/irdb/codes/Sanyo/
/root/IRDB/irdb/codes/Sanyo/Unknown_sanyo-tv01/56,-1.csv:KEY_SETUP,NEC,56,-1,23
/root/IRDB/irdb/codes/Sanyo/Unknown_RB-SL22/60,196.csv:KEY_SETUP,NEC,60,196,2
root@debian-ir-hex:~# python3 3_bulk_irdb_to_raw.py 

Paste your CSV file path lines below. Press ENTER until all codes appear and then CTRL+D to exit):

/root/IRDB/irdb/codes/Sanyo/Unknown_sanyo-tv01/56,-1.csv:KEY_SETUP,NEC,56,-1,23
/root/IRDB/irdb/codes/Sanyo/Unknown_RB-SL22/60,196.csv:KEY_SETUP,NEC,60,196,2


===========================================================================
File Path  : /root/IRDB/irdb/codes/Sanyo/Unknown_sanyo-tv01/56,-1.csv
Function   : KEY_SETUP
Raw Timing : [9024, 4512, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 45396]
===========================================================================

===========================================================================
File Path  : /root/IRDB/irdb/codes/Sanyo/Unknown_RB-SL22/60,196.csv
Function   : KEY_SETUP
Raw Timing : [9024, 4512, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 40884]
===========================================================================

[This is where Enter and Ctrl+D were pressed]

root@debian-ir-hex:~# python3 4_bulk_raw_to_tuya.py 

Paste your formatted IR data below, beginning with and ending with '=' per decimal section.


Press CTRL+D twice (or CTRL+Z on Windows) when done.

===========================================================================
File Path  : /root/IRDB/irdb/codes/Sanyo/Unknown_sanyo-tv01/56,-1.csv
Function   : KEY_SETUP
Raw Timing : [9024, 4512, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 45396]
===========================================================================

===========================================================================
File Path  : /root/IRDB/irdb/codes/Sanyo/Unknown_RB-SL22/60,196.csv
Function   : KEY_SETUP
Raw Timing : [9024, 4512, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 1692, 564, 564, 564, 1692, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 564, 1692, 564, 564, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 1692, 564, 40884]
===========================================================================

[This is where I pressed Ctrl+D twice]

============================================================

Function: KEY_SETUP
Generated Tuya IR Code:
BUAjoBE0AuADAQGcBuABA+AfAeAHM+ATO+ADI8ADAVSx

Function: KEY_SETUP
Generated Tuya IR Code:
BUAjoBE0AsABAZwG4AUD4AcB4AcT4AMn4Asv4Ac34AdfwAMBtJ8=

============================================================

root@debian-ir-hex:~# 

I have some scratch documentation that I will run through the steps to see if I can replicate it before any dissemination. This is just on a Proxmox Debian 12 Bookworm LXC, so I am unsure about other device compatibility.

@burkminipup
Copy link

burkminipup commented Jan 15, 2025

In order to avoid thread hijacking, I have incorporated the above scripts into a new project. Feel free to try it out. Feedback is welcomed, as this is my first GitHub code contribution:

https://github.com/burkminipup/irdb-to-tuya/

Thanks for all the help in these comments getting me on the right track.

@magicus
Copy link

magicus commented Jan 15, 2025

👍 Thanks! I'll check out your repo and have a look at it, and continue any discussion related to your scripts over there.

@burkminipup
Copy link

Something that's worth noting, if you have any timings larger than 65,535 you will need to just set it to 65,535, since each timing is represented with 2 bytes. I had some like that at the end of the commands, but it worked fine just shortening it.

Thanks @Bazoogle I ran into this issue on a new remote last night. Updated the code to support clamping integers >65535 to match exactly 65535 and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment