mildsunrise/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Tuya's IR blasters, like the ZS08, have the ability to both learn
and blast generic IR codes. These IR codes are given to the user as an
opaque string, like this:
A/IEiwFAAwbJAfIE8gSLIAUBiwFAC+ADAwuLAfIE8gSLAckBRx9AB0ADBskB8gTyBIsgBQGLAUALA4sB8gRAB8ADBfIEiwHJAeARLwHJAeAFAwHyBOC5LwGLAeA97wOLAfIE4RcfBYsB8gTyBEAFAYsB4AcrCYsB8gTyBIsByQHgPY8DyQHyBOAHAwHyBEAX4BVfBIsB8gTJoAMF8gSLAckB4BUvAckB4AEDBfIEiwHJAQ==

Not much is known about the format of these IR code strings, which makes
it difficult to use codes obtained through other means (such as a
manual implementation of the IR protocol for a particular device, or
public Internet code tables) with these blasters, as well as to use codes
learnt through these blasters with other brands of blasters and
study their contents.
So far I've only been able to find one person who dug into this before me,
who was able to understand it enough to create their own codes to
blast, but not enough to understand codes learnt by the device.
This document attempts to fully document the format and also provides
a (hopefully) working Python implementation.
Overview

There is no standard for IR codes, so appliances use different
methods to encode the data into an IR signal, often called "IR protocols".
A popular one, which could be considered an unofficial standard, is
the NEC protocol. NEC specifies a way to encode 16 bits as a
series of pulses of modulated IR light, but it's just one protocol.
Tuya's IR blasters are meant to be generic and work with just about any
protocol. To do that, they work at a lower level and record the IR signal
directly instead of detecting a particular protocol and decoding the bits.
In particular, the blaster records a binary signal like this one:
  +------+     +----------+  +-+
  |      |     |          |  | |
--+      +-----+          +--+ +---

Such a signal can be represented by noting the times at which the signal
flips from low to high and viceversa. It is better to record the differences
of these times as they will be smaller numbers. For example, the above signal
is represented as:
[7, 6, 11, 3, 2]
Meaning, the signal stays high for 7 units of time, then low for 6 units of
time, then high for 11 units, and so on. The first time is always for a high
state, which means even times (2nd, 4th, 6th...) are always low periods while
odd times (1st, 3rd, 5th...) are always high periods.
The blaster takes these numbers (in units of microseconds) and encodes each
of them as a little-endian 16-bit integer, resulting in the following 10 bytes:
07 00 06 00 0B 00 03 00 02 00

Because we're recording a signal rather than high-level protocol data,
this results in very long messages in real life. So, the blaster
compresses these bytes using a weird algorithm (see below),
and then encodes the resulting bytes using base64 so the user can copy/paste
the code easily.
Compression scheme


Update: Turns out this is FastLZ compression.
No need to read this section, you can go to their website instead.

I was unable to find a public algorithm that matched this, so I'm
assuming it's a custom lossless compression algorithm that a
random Tuya employee hacked to make my life more complicated.
Jokes aside it seems to be doing a very poor job, and if I were
them I would've just used Huffman coding or something.
Anyway, the algorithm is LZ77-based, with a fixed 8kB window.
The stream contains a series of blocks. Each block begins with a
"header byte", and the 3 MSBs of this byte determine the type of block:


If the 3 bits are zero, then this is a literal block and the other
5 bits specify a length L minus one.
Upon encountering this block, the decoder consumes the next L bytes
from the stream and emits them as output.
+---------+-----------------------------+
|000LLLLLL| 1..32 bytes, depending on L |
+---------+-----------------------------+


If the 3 bits have any other value, then this is a
length-distance pair block; the 3 bits specify a length L minus 2,
and the concatenation of the other 5 bits with the next byte specifies
a distance D minus 1.
Upon encountering this block, the decoder copies L bytes from the
previous output. It begins copying D bytes before the output cursor,
so if D = 1, the first copied byte is the most recently emitted byte;
if D = 2, the byte before that one, and so on.
As usual, it may happen that L > D, in which case the output
repeats as necessary (for example if L = 5 and D = 2, and the 2 last
emitted bytes are X and Y, the decoder would emit XYXYX).
+--------+--------+
|LLLDDDDD|DDDDDDDD|
+--------+--------+

As a special case, if the 3 bits are one, then there's an extra
byte preceding the distance byte, which specifies a value to be added to L:
+--------+--------+--------+
|111DDDDD|LLLLLLLL|DDDDDDDD|
+--------+--------+--------+


## tuya.py
import io
import base64
from bisect import bisect
from struct import pack, unpack


# MAIN API

def decode_ir(code: str) -> list[int]:
	'''
	Decodes an IR code string from a Tuya blaster.
	Returns the IR signal as a list of µs durations,
	with the first duration belonging to a high state.
	'''
	payload = base64.decodebytes(code.encode('ascii'))
	payload = decompress(io.BytesIO(payload))

	signal = []
	while payload:
		assert len(payload) >= 2, \
			f'garbage in decompressed payload: {payload.hex()}'
		signal.append(unpack('<H', payload[:2])[0])
		payload = payload[2:]
	return signal

def encode_ir(signal: list[int], compression_level=2) -> str:
	'''
	Encodes an IR signal (see `decode_tuya_ir`)
	into an IR code string for a Tuya blaster.
	'''
	payload = b''.join(pack('<H', t) for t in signal)
	compress(out := io.BytesIO(), payload, compression_level)
	payload = out.getvalue()
	return base64.encodebytes(payload).decode('ascii').replace('\n', '')


# DECOMPRESSION

def decompress(inf: io.FileIO) -> bytes:
	'''
	Reads a "Tuya stream" from a binary file,
	and returns the decompressed byte string.
	'''
	out = bytearray()

	while (header := inf.read(1)):
		L, D = header[0] >> 5, header[0] & 0b11111
		if not L:
			# literal block
			L = D + 1
			data = inf.read(L)
			assert len(data) == L
		else:
			# length-distance pair block
			if L == 7:
				L += inf.read(1)[0]
			L += 2
			D = (D << 8 | inf.read(1)[0]) + 1
			data = bytearray()
			while len(data) < L:
				data.extend(out[-D:][:L-len(data)])
		out.extend(data)

	return bytes(out)


# COMPRESSION

def emit_literal_blocks(out: io.FileIO, data: bytes):
	for i in range(0, len(data), 32):
		emit_literal_block(out, data[i:i+32])

def emit_literal_block(out: io.FileIO, data: bytes):
	length = len(data) - 1
	assert 0 <= length < (1 << 5)
	out.write(bytes([length]))
	out.write(data)

def emit_distance_block(out: io.FileIO, length: int, distance: int):
	distance -= 1
	assert 0 <= distance < (1 << 13)
	length -= 2
	assert length > 0
	block = bytearray()
	if length >= 7:
		assert length < (1 << 8)
		block.append(length - 7)
		length = 7
	block.insert(0, length << 5 | distance >> 8)
	block.append(distance & 0xFF)
	out.write(block)

def compress(out: io.FileIO, data: bytes, level=2):
	'''
	Takes a byte string and outputs a compressed "Tuya stream".

	Implemented compression levels:
	0 - copy over (no compression, 3.1% overhead)
	1 - eagerly use first length-distance pair found (linear)
	2 - eagerly use best length-distance pair found
	3 - optimal compression (n^3)
	'''
	if level == 0:
		return emit_literal_blocks(out, data)

	W = 2**13 # window size
	L = 256+9 # maximum length
	distance_candidates = lambda: range(1, min(pos, W) + 1)

	def find_length_for_distance(start: int) -> int:
		length = 0
		limit = min(L, len(data) - pos)
		while length < limit and data[pos + length] == data[start + length]:
			length += 1
		return length
	find_length_candidates = lambda: \
		( (find_length_for_distance(pos - d), d) for d in distance_candidates() )
	find_length_cheap = lambda: \
		next((c for c in find_length_candidates() if c[0] >= 3), None)
	find_length_max = lambda: \
		max(find_length_candidates(), key=lambda c: (c[0], -c[1]), default=None)

	if level >= 2:
		suffixes = []; next_pos = 0
		key = lambda n: data[n:]
		find_idx = lambda n: bisect(suffixes, key(n), key=key)
		def distance_candidates():
			nonlocal next_pos
			while next_pos <= pos:
				if len(suffixes) == W:
					suffixes.pop(find_idx(next_pos - W))
				suffixes.insert(idx := find_idx(next_pos), next_pos)
				next_pos += 1
			idxs = (idx+i for i in (+1,-1)) # try +1 first
			return (pos - suffixes[i] for i in idxs if 0 <= i < len(suffixes))

	if level <= 2:
		find_length = { 1: find_length_cheap, 2: find_length_max }[level]
		block_start = pos = 0
		while pos < len(data):
			if (c := find_length()) and c[0] >= 3:
				emit_literal_blocks(out, data[block_start:pos])
				emit_distance_block(out, c[0], c[1])
				pos += c[0]
				block_start = pos
			else:
				pos += 1
		emit_literal_blocks(out, data[block_start:pos])
		return

	# use topological sort to find shortest path
	predecessors = [(0, None, None)] + [None] * len(data)
	def put_edge(cost, length, distance):
		npos = pos + length
		cost += predecessors[pos][0]
		current = predecessors[npos]
		if not current or cost < current[0]:
			predecessors[npos] = cost, length, distance
	for pos in range(len(data)):
		if c := find_length_max():
			for l in range(3, c[0] + 1):
				put_edge(2 if l < 9 else 3, l, c[1])
		for l in range(1, min(32, len(data) - pos) + 1):
			put_edge(1 + l, l, 0)

	# reconstruct path, emit blocks
	blocks = []; pos = len(data)
	while pos > 0:
		_, length, distance = predecessors[pos]
		pos -= length
		blocks.append((pos, length, distance))
	for pos, length, distance in reversed(blocks):
		if not distance:
			emit_literal_block(out, data[pos:pos + length])
		else:
			emit_distance_block(out, length, distance)
	import io
	import base64
	from bisect import bisect
	from struct import pack, unpack


	# MAIN API

	def decode_ir(code: str) -> list[int]:
	'''
	Decodes an IR code string from a Tuya blaster.
	Returns the IR signal as a list of µs durations,
	with the first duration belonging to a high state.
	'''
	payload = base64.decodebytes(code.encode('ascii'))
	payload = decompress(io.BytesIO(payload))

	signal = []
	while payload:
	assert len(payload) >= 2, \
	f'garbage in decompressed payload: {payload.hex()}'
	signal.append(unpack('<H', payload[:2])[0])
	payload = payload[2:]
	return signal

	def encode_ir(signal: list[int], compression_level=2) -> str:
	'''
	Encodes an IR signal (see `decode_tuya_ir`)
	into an IR code string for a Tuya blaster.
	'''
	payload = b''.join(pack('<H', t) for t in signal)
	compress(out := io.BytesIO(), payload, compression_level)
	payload = out.getvalue()
	return base64.encodebytes(payload).decode('ascii').replace('\n', '')


	# DECOMPRESSION

	def decompress(inf: io.FileIO) -> bytes:
	'''
	Reads a "Tuya stream" from a binary file,
	and returns the decompressed byte string.
	'''
	out = bytearray()

	while (header := inf.read(1)):
	L, D = header[0] >> 5, header[0] & 0b11111
	if not L:
	# literal block
	L = D + 1
	data = inf.read(L)
	assert len(data) == L
	else:
	# length-distance pair block
	if L == 7:
	L += inf.read(1)[0]
	L += 2
	D = (D << 8 \| inf.read(1)[0]) + 1
	data = bytearray()
	while len(data) < L:
	data.extend(out[-D:][:L-len(data)])
	out.extend(data)

	return bytes(out)


	# COMPRESSION

	def emit_literal_blocks(out: io.FileIO, data: bytes):
	for i in range(0, len(data), 32):
	emit_literal_block(out, data[i:i+32])

	def emit_literal_block(out: io.FileIO, data: bytes):
	length = len(data) - 1
	assert 0 <= length < (1 << 5)
	out.write(bytes([length]))
	out.write(data)

	def emit_distance_block(out: io.FileIO, length: int, distance: int):
	distance -= 1
	assert 0 <= distance < (1 << 13)
	length -= 2
	assert length > 0
	block = bytearray()
	if length >= 7:
	assert length < (1 << 8)
	block.append(length - 7)
	length = 7
	block.insert(0, length << 5 \| distance >> 8)
	block.append(distance & 0xFF)
	out.write(block)

	def compress(out: io.FileIO, data: bytes, level=2):
	'''
	Takes a byte string and outputs a compressed "Tuya stream".

	Implemented compression levels:
	0 - copy over (no compression, 3.1% overhead)
	1 - eagerly use first length-distance pair found (linear)
	2 - eagerly use best length-distance pair found
	3 - optimal compression (n^3)
	'''
	if level == 0:
	return emit_literal_blocks(out, data)

	W = 2**13 # window size
	L = 256+9 # maximum length
	distance_candidates = lambda: range(1, min(pos, W) + 1)

	def find_length_for_distance(start: int) -> int:
	length = 0
	limit = min(L, len(data) - pos)
	while length < limit and data[pos + length] == data[start + length]:
	length += 1
	return length
	find_length_candidates = lambda: \
	( (find_length_for_distance(pos - d), d) for d in distance_candidates() )
	find_length_cheap = lambda: \
	next((c for c in find_length_candidates() if c[0] >= 3), None)
	find_length_max = lambda: \
	max(find_length_candidates(), key=lambda c: (c[0], -c[1]), default=None)

	if level >= 2:
	suffixes = []; next_pos = 0
	key = lambda n: data[n:]
	find_idx = lambda n: bisect(suffixes, key(n), key=key)
	def distance_candidates():
	nonlocal next_pos
	while next_pos <= pos:
	if len(suffixes) == W:
	suffixes.pop(find_idx(next_pos - W))
	suffixes.insert(idx := find_idx(next_pos), next_pos)
	next_pos += 1
	idxs = (idx+i for i in (+1,-1)) # try +1 first
	return (pos - suffixes[i] for i in idxs if 0 <= i < len(suffixes))

	if level <= 2:
	find_length = { 1: find_length_cheap, 2: find_length_max }[level]
	block_start = pos = 0
	while pos < len(data):
	if (c := find_length()) and c[0] >= 3:
	emit_literal_blocks(out, data[block_start:pos])
	emit_distance_block(out, c[0], c[1])
	pos += c[0]
	block_start = pos
	else:
	pos += 1
	emit_literal_blocks(out, data[block_start:pos])
	return

	# use topological sort to find shortest path
	predecessors = [(0, None, None)] + [None] * len(data)
	def put_edge(cost, length, distance):
	npos = pos + length
	cost += predecessors[pos][0]
	current = predecessors[npos]
	if not current or cost < current[0]:
	predecessors[npos] = cost, length, distance
	for pos in range(len(data)):
	if c := find_length_max():
	for l in range(3, c[0] + 1):
	put_edge(2 if l < 9 else 3, l, c[1])
	for l in range(1, min(32, len(data) - pos) + 1):
	put_edge(1 + l, l, 0)

	# reconstruct path, emit blocks
	blocks = []; pos = len(data)
	while pos > 0:
	_, length, distance = predecessors[pos]
	pos -= length
	blocks.append((pos, length, distance))
	for pos, length, distance in reversed(blocks):
	if not distance:
	emit_literal_block(out, data[pos:pos + length])
	else:
	emit_distance_block(out, length, distance)