Skip to content

Instantly share code, notes, and snippets.

@Cqoicebordel
Last active June 3, 2023 09:11
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Cqoicebordel/d9110b4b1191b9e9f6a8165438e00ea0 to your computer and use it in GitHub Desktop.
Save Cqoicebordel/d9110b4b1191b9e9f6a8165438e00ea0 to your computer and use it in GitHub Desktop.
Twitch's json comments to Youtube subtitles converter
#!/usr/bin/python3
import sys
import json
import datetime
import secrets
import html
# Name file should be in command line
if len(sys.argv) < 2:
print("You need to add the input json as an argument.")
exit()
with open(sys.argv[1], 'r', encoding='UTF8') as jsonFile:
data = json.loads(jsonFile.read())
# Time delay in second for chat, to sync with video. For me, 7s looks like it's in sync with what the video display
delta = 7
# Duration of display of the line
duration = 10
# Size of the font. Supposed to be in percent of standart, but it looks like all sizes aren't available
size = 10
# Bacground color. The standart of YT is solid black
background_color = "#000000"
# Opacity of the background color. Between 0 and 254. 254 means that the color is not transparent
background_opacity = 200
header = '<?xml version="1.0" encoding="utf-8"?><timedtext format="3"><head><wp id="0" ap="7" ah="0" av="0" /><wp id="1" ap="6" ah="0" av="100" /><ws id="0" ju="2" pd="0" sd="0" /><ws id="1" ju="0" pd="0" sd="0" />'
middle = '</head><body>'
footer = '</body></timedtext>'
spacer = '<s p="1">​\n​</s>'
output = ''
texts = []
timestamps = []
colors = ["#000000", "#FEFEFE"]
users = ["yt-bug", "white-text"]
for i in data['comments']:
display_name = i['commenter']['display_name']
message = i['message']['body']
if display_name not in users:
users.append(display_name)
userIndex = users.index(display_name)
if i['message']['user_color'] is not None:
color = i['message']['user_color']
else:
color = "#"+secrets.token_hex(3)
#print(display_name+color)
colors.insert(userIndex, color)
else:
userIndex = users.index(display_name)
texts.append('<s p="'+str(userIndex)+'">'+display_name+':</s><s p="1"> '+html.escape(message)+'</s>')
timestamps.append([i['content_offset_seconds']+delta, True])
timestamps.append([i['content_offset_seconds']+delta+duration, False])
lengthColors = len(colors)
for i in range(lengthColors):
header += '<pen id="'+str(i)+'" sz="'+str(size)+'" fc="'+colors[i]+'" fo="254" bc="'+background_color+'" bo="'+str(background_opacity)+'" />'
timestamps.sort(key=lambda x: x[0])
start = 0
end = 1
length = len(timestamps)
for i in range(1,length):
combined_text = ""
for j in range(start,end):
combined_text += texts[j]
if j != end-1:
combined_text += spacer
timestamp_start = int(timestamps[i-1][0]*1000)
timestamp_end = int(timestamps[i][0]*1000)
if start != end:
output += '<p t="'+str(timestamp_start)+'" d="'+str(timestamp_end-timestamp_start)+'" wp="1" ws="1"><s p="1">​</s>'+combined_text+'<s p="1">​</s></p>'
if timestamps[i][1]:
end += 1
else:
start += 1
if len(sys.argv) == 3:
with open(sys.argv[2], 'w', encoding='UTF8') as outputFile:
outputFile.write(header+middle+output+footer)
else:
print(header+middle+output+footer)

Python script to convert the Json containing all the chat of Twitch, to the Youtube Timed Text subtitle format (YTT).

Features

  • Use the users' colors
  • Create a unique color for users who don't have one
  • Display the subtitles in chat order (new at bottom). Do this by compiling collisions
  • Display the chat at the bottom left of the screen
  • Display unicode emojis

Absent forever features

  • Doesn't show Twitch only emojis (show only their names)
  • Doesn't show users' badges

Usage

Either make it executable (chmod +x), or just run python3 convert-combined-ytt.py twitchChat.json. It will write the output to the console. You can save the file by redirecting the output with > :
python3 convert-combined-ytt.py twitchChat.json > twitchChat.ytt

You can also use a second command line argument to set an output file. It's probably needed if your console/cmd is not in UTF-8. Beware, it'll write to the file without any confirmation asked if the file exists, overwritting it by default.
python3 convert-combined-ytt.py twitchChat.json twitchChat.ytt

Notes

You can use Twitch Downloader to download the JSON of the Twitch Chat : TwitchDownloaderCLI -m ChatDownload -u videoID -o videoID.json

I also used YTSubConverter extensively to understand the YTT format. Note that they implemented at my request a feature to handle the reverse order of Collision in .ass format, allowing for chat order of subtitles (new at bottom).
This tool is great, so don't hesitate to use it if it fits your needs more.

Note that in order to do this script, I also made JSON to .ssa and JSON to .ass converters. If you want them, ask and I'll add them.

The script may have bugs. I didn't test it extensively. But looks good as far as I know. Don't hesitate to comment if you find bugs.

@Yoon0618
Copy link

The conversion function works well. thanks.
However, there is a problem with the codec.

with original code, i got this:

D:\Downloads>python convert-combined-ytt.py test2.json
Traceback (most recent call last):
  File "D:\Downloads\convert-combined-ytt.py", line 17, in <module>
    data = json.loads(jsonFile.read())
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 390: illegal multibyte sequence

so i edited code to...

with open(sys.argv[1], 'r', encoding='UTF8') as jsonFile:
	data = json.loads(jsonFile.read())

and it worked! but not worked with file save option:

D:\Downloads>python convert-combined-ytt.py test2.json > test2.ytt
Traceback (most recent call last):
  File "D:\Downloads\convert-combined-ytt.py", line 89, in <module>
    print(header+middle+output+footer)
UnicodeEncodeError: 'cp949' codec can't encode character '\u200b' in position 3600: illegal multibyte sequence

i'm not good at coding so... that's all I can do
and this is my json: link

@Cqoicebordel
Copy link
Author

Thanks :)

Ah I think it's a Windows problem :
I'm on Linux, and everything, from the files to the console is in UTF-8. I suspect that the console/cmd in Windows is not in UTF-8 and so, it can create issues. It looks like you are in Windows-949, or Unified Hangul Code.

In any case, there are possibly two somewhat easy solutions to this issue. Either you switch to UTF-8 (just kidding), or instead of outputting to the console, I output to a file.
So, I replaced the last line

print(header+middle+output+footer)

by

if len(sys.argv) == 3:
	with open(sys.argv[2], 'w', encoding='UTF8') as outputFile:
		outputFile.write(header+middle+output+footer)
else:
	print(header+middle+output+footer)

which means that if there is another thing in the command line, it will be used as an output file. Beware, it will do it even if the file already exist.

I'll update the files above to reflect that change, and I'll add the encoding as well.
Please tell me if it works :)

@Yoon0618
Copy link

D:\Downloads>python convert-combined-ytt.py test.json test.ytt
You need to only add the input json as an argument.

so I deleted the line 11-13, and now working perfectly!

@Cqoicebordel
Copy link
Author

Ah yes, I forgot to change it here, thanks !
And thanks for the confirmation !

@KaMyKaSii
Copy link

KaMyKaSii commented Apr 24, 2022

I tried your script but the first thing I noticed was that when trying to upload the subtitle file my old notebook was extremely slow, apparently YouTube uses the browser's local processing to load each subtitle line. When testing with a smaller processed .json I was able to upload it to YouTube, but the result in the player was the same in the standard YouTube subtitles (a single centered line) and with overlap as Twitch timestamps are disorganized. Is it not possible to do as in the example video that was sent? Using yt-dlp in the link it is possible to download the custom subtitle of the video in the formats "vtt, ttml, srv3, srv2, srv1, json3", I just don't know which format was originally uploaded for the video. Also, can you send the other formats converters? Thank you

@Cqoicebordel
Copy link
Author

So.
Yeah, Youtube does everything locally, and it's a pain because it uses a lot of CPU. But I can't do nothing about that :/
About the preview, yeah, I know, Youtube shows only the classic subtitle way. But you have to keep in mind that this tool is only to sync the subtitles. If you upload the ytt, it will work as the video on the example. You can try on a private video to be sure, if you like, but I confirm it works.
I will eventually add the others converters, but keep in my that they don't work for Youtube. They are just for local play.
YTT (and some sort of JSON) are the only formats working in YT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment