Skip to content

Instantly share code, notes, and snippets.

@ES-Alexander
Last active November 23, 2023 01:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ES-Alexander/c1393388601b8a8f7a4474a2f90698be to your computer and use it in GitHub Desktop.
Save ES-Alexander/c1393388601b8a8f7a4474a2f90698be to your computer and use it in GitHub Desktop.
SubStation Alpha (SSA/ASS) embedded file encoder
#!/usr/bin/env python3
def parse(file):
''' Generates encoded characters from file byte data.
Encoding is suitable for embedded [Graphics] in SubStation Alpha files.
See here for encoding specification and other details:
http://www.tcax.org/docs/ass-specs.htm
Bytes are split into groups of 6 bits, then 33 is added to each group
(which ensures all encoded bytes are printable ascii characters, and not
lower-case).
Most bytes are handled in groups of 3, since 3*8 = 24 bits = 4 groups of 6.
If the file length is not a multiple of 3 bytes the remaining one or two
bytes are left-shifted, with zero-bits added on the right to get an even
multiple of 6 bits, which can then be split and encoded normally.
'''
encoded_bits = 6
while (data := file.read(3)):
if (offset := (len(data) % 3)) != 0:
if offset == 1:
# 1 remainder byte
joined = data[0] << 6
encoded_characters = 2
else:
# 2 remainder bytes
joined = data[0] << 10 | data[1] << 2
encoded_characters = 3
else:
# 3 bytes (normal)
joined = data[0] << 16 | data[1] << 8 | data[2]
encoded_characters = 4
# yield one encoded character at a time
yield from (chr(((joined & 0b11_1111 << split) >> split) + 33)
for split in range((encoded_characters - 1) * encoded_bits,
-1, -encoded_bits))
def to_lines(parser):
''' Yields encoded characters in 80-character lines. '''
for index, value in enumerate(parser):
if index and index % 80 == 0:
yield '\n'
yield value
yield '\n'
if __name__ == '__main__':
from pathlib import Path
from argparse import ArgumentParser
parser = ArgumentParser(
description='Advanced SubStation embedded file encoder')
parser.add_argument('input_filename', type=Path)
args = parser.parse_args()
valid_file_types = ('.bmp', '.jpg', '.gif', '.ico', '.wmf', '.ttf')
assert (file_type := args.input_filename.suffix) in valid_file_types, \
f'Unsupported {file_type = } - must be one of {valid_file_types}'
print('[Graphics]')
print('filename:', args.input_filename.name)
with open(args.input_filename, 'rb') as file:
print(''.join(to_lines(parse(file))))
@ES-Alexander
Copy link
Author

Requires Python >= 3.8.

Details/Caveats

Outputs the same encoded result as Aegisub, so I assume it's implemented correctly. Unfortunately Aegisub doesn't support the SSA Picture event, so it can't be tested there. From the Aegisub Attachment Manager docs:

The SSA format specification only allows certain filetypes to be attached. For fonts, only .ttf is allowed. For pictures, .bmp, .gif, .ico, .jpg and .wmf are allowed (note the absence of .png). None of the SSA commands which use the images are implemented in anything but SubStation Alpha, so it is very unlikely that attaching pictures is actually a useful thing to do.

Those docs also mention that

Many SSA/ASS editors ignore or strip attachments. The original SubStation Alpha (v4.08) will work fine, but only for real SSA files. Sabbu will complain about unrecognized fields, and strip the attachments if you save the file. Most other editors either ignore the attachments or crash when encountering them.

A notable exception is mkvmerge, which will convert the attached files to Matroska attachments on muxing. If you demux the script again, the attachments will be stripped from the script, but they're still there as MKV attachments.

so it would seem either SubStation Alpha or mkvmerge are required for handling embedded files.

I'm unsure whether mkvmerge can handle Picture events, so will need to try that. I'm also not sure whether newer editors like SubtitleEdit can handle Picture events. Assuming I formatted the line correctly, I believe VLC doesn't, and it at least apparently doesn't render embedded fonts, so likely can't handle embedded images even if it does support Pictures (although I did also try the file-path method, which didn't work either).

It would likely be sufficient to burn in the subtitles (image(s) included) with HandBreak, but it also didn't seem to support the Picture events I tried.

Alternatives

If I can't get SubStation Alpha or mkvmerge working then perhaps images will need to be included using the vector-graphics Drawings commands/style override codes specified in the ASS format. I suppose it's also technically possible to automatically convert a normal image into a vectorised format with individual squares/rectangles representing the pixels, but that's likely ill-advised for performance and memory reasons.

@ES-Alexander
Copy link
Author

Drawing tools seem to work, including on VLC. Straight lines are easy/intuitive, but bezier curves are hard to create without a GUI/some automation.

Not sure if it's best to make a converter that's tailored to logos and other low-palette images, or use existing online img->svg converters and make an svg parser. For conversion, would require code that detects a colour palette (kmeans on perceptual colour space?) and the major shapes/contours in an image, and outputs a valid set of commands (moves/lines/beziers) to draw them. For the output of either approach, user could specify nominal output size, and then specify a scale in the \p<scale> command for using it in smaller videos.

@ES-Alexander
Copy link
Author

ES-Alexander commented Oct 5, 2021

Installed SubStation Alpha but couldn't get it to work with embedded files, or Pictures more generally. Also confirmed that HandBrake doesn't seem capable of handling them or burning them in. At least the embedding code was fun to write.

Seems like vector-based drawings are the way to go, since they actually work across the main available modern softwares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment