Skip to content

Instantly share code, notes, and snippets.

@majora2007
Last active February 1, 2024 20:26
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save majora2007/724354d081627cfd96c24b8eefef4ec3 to your computer and use it in GitHub Desktop.
Save majora2007/724354d081627cfd96c24b8eefef4ec3 to your computer and use it in GitHub Desktop.
Remove foreign language audio and subtitle tracks from mkv files in bulk
#!/usr/bin/python
# Removes non-LANG audio tracks and subtitles from mkv files in a directory.
# Original script by greenbender at https://forum.videohelp.com/threads/343271-BULK-remove-non-English-tracks-from-MKV-container
# Modified by Joseph Milazzo for updated MkvMerge commands.
# 12/3/2021: Updated to Python 3.9
import os
import re
import sys
import subprocess
import json
# change this for other languages (3 character code)
LANG = "eng"
# set this to the path for mkvmerge
MKVMERGE = "mkvmerge.exe"
AUDIO_RE = re.compile(
r"Track ID (\d+): audio \([A-Z0-9_/]+\) [number:\d+ uid:\d+ codec_id:[A-Z0-9_/]+ codec_private_length:\d+ language:([a-z]{3})")
SUBTITLE_RE = re.compile(
r"Track ID (\d+): subtitles \([A-Z0-9_/]+\) [number:\d+ uid:\d+ codec_id:[A-Z0-9_/]+ codec_private_length:\d+ language:([a-z]{3})(?: track_name:\w*)? default_track:[01]{1} forced_track:([01]{1})")
if len(sys.argv) < 2:
print("Please supply an input directory")
sys.exit()
in_dir = sys.argv[1]
for root, dirs, files in os.walk(in_dir):
for f in files:
# filter out non mkv files
if not f.endswith(".mkv"):
continue
# path to file
path = os.path.join(root, f)
# build command line
cmd = [MKVMERGE, "-J", path]
# get mkv info
mkvmerge = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = mkvmerge.communicate()
if mkvmerge.returncode != 0:
print >> sys.stderr, "mkvmerge failed to identify", path
continue
# find audio and subtitle tracks
audio = []
subtitle = []
info_json = json.loads(stdout)
tracks = info_json['tracks']
for track in tracks:
track['properties']['id'] = track['id']
if track['type'] == 'audio':
audio.append(track)
elif track['type'] == 'subtitles':
subtitle.append(track)
# filter out files that don't need processing
if len(audio) < 2 and len(subtitle) < 2:
print >> sys.stderr, "nothing to do for", path
continue
# filter out tracks that don't match the language
audio_lang = filter(lambda a: a['properties']['language'] == LANG, audio)
subtitle_lang = filter(lambda a: a['properties']['language'] == LANG, subtitle)
# filter out files that don't need processing
if len(audio_lang) == 0 and len(subtitle_lang) == 0:
print >> sys.stderr, "no tracks with that language in", path
continue
# build command line
cmd = [MKVMERGE, "-o", path + ".temp"]
if len(audio_lang):
cmd += ["--audio-tracks", ",".join([str(a['id']) for a in audio_lang])]
for i in range(len(audio_lang)):
cmd += ["--default-track", ":".join([str(audio_lang[i]['id']), "0" if i else "1"])]
if len(subtitle_lang):
cmd += ["--subtitle-tracks", ",".join([str(s['id']) for s in subtitle_lang])]
for i in range(len(subtitle_lang)):
cmd += ["--default-track", ":".join([str(subtitle_lang[i]['id']), "0"])]
cmd += [path]
# process file
print >> sys.stderr, "Processing", path, "...",
mkvmerge = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = mkvmerge.communicate()
if mkvmerge.returncode != 0:
print >> sys.stderr, "Failed"
print(stdout)
continue
print >> sys.stderr, "Succeeded"
# overwrite file
os.remove(path) # Don't overwrite
os.rename(path + ".temp", path)
@lindelux
Copy link

lindelux commented Dec 3, 2021

Could you update this for Python 3.8 please?

@majora2007
Copy link
Author

@lindelux Done

@vidarak
Copy link

vidarak commented Jan 3, 2022

Nice! It would be useful to specify more than one language to keep. (I'd like to keep both Norwegian and English)

@majora2007
Copy link
Author

@vidarak You should be able to easily do that by changing

subtitle_lang = filter(lambda a: a['properties']['language'] == LANG, subtitle)
to
subtitle_lang = filter(lambda a: a['properties']['language'] in LANG, subtitle)

and have LANG = ["eng", "Norwegian code"]

@vidarak
Copy link

vidarak commented Jan 3, 2022

Thanks! I found that it didn't work with Python 3.9 and the additional language needed some extra comparisons. I'm not a developer but think I managed to fix it....

#!/usr/bin/python3
# Removes non-LANG audio tracks and subtitles from mkv files in a directory.
# Original script by greenbender at https://forum.videohelp.com/threads/343271-BULK-remove-non-English-tracks-from-MKV-container
# Modified by Joseph Milazzo for updated MkvMerge commands.
# 12/3/2021: Updated to Python 3.9

import os
import re
import sys
import subprocess
import json

def print_to_stderr(*a):
    print(*a, file = sys.stderr)

# change this for other languages (3 character code)
LANG = ["eng","nor"]

# set this to the path for mkvmerge
MKVMERGE = "/usr/bin/mkvmerge"

AUDIO_RE = re.compile(
    r"Track ID (\d+): audio \([A-Z0-9_/]+\) [number:\d+ uid:\d+ codec_id:[A-Z0-9_/]+ codec_private_length:\d+ language:([a-z]{3})")
SUBTITLE_RE = re.compile(
    r"Track ID (\d+): subtitles \([A-Z0-9_/]+\) [number:\d+ uid:\d+ codec_id:[A-Z0-9_/]+ codec_private_length:\d+ language:([a-z]{3})(?: track_name:\w*)? default_track:[01]{1} forced_track:([01]{1})")

if len(sys.argv) < 2:
    print("Please supply an input directory")
    sys.exit()

in_dir = sys.argv[1]

for root, dirs, files in os.walk(in_dir):
    for f in files:

        # filter out non mkv files
        if not f.endswith(".mkv"):
            continue

        # path to file
        path = os.path.join(root, f)

        # build command line
        cmd = [MKVMERGE, "-J", path]

        # get mkv info
        mkvmerge = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout, stderr = mkvmerge.communicate()
        if mkvmerge.returncode != 0:
            print_to_stderr("mkvmerge failed to identify "+ path)
            continue

        # find audio and subtitle tracks
        audio = []
        subtitle = []
        info_json = json.loads(stdout)
        tracks = info_json['tracks']
        for track in tracks:
            track['properties']['id'] = track['id']
            if track['type'] == 'audio':
                audio.append(track)
            elif track['type'] == 'subtitles':
                subtitle.append(track)

        # filter out files that don't need processing
        if len(audio) < 2 and len(subtitle) < 2:
            print_to_stderr("nothing to do for " + path)
            continue

        # filter out tracks that don't match the language
        audio_lang = list(filter(lambda a: a['properties']['language'] in LANG, audio))
        subtitle_lang = list(filter(lambda a: a['properties']['language'] in LANG, subtitle))

        # filter out files that don't need processing
        if audio_lang == audio and subtitle_lang == subtitle:
            print_to_stderr("nothing to do for " + path)
            continue

        # filter out files that don't need processing
        if len(audio_lang) == 0 and len(subtitle_lang) == 0:
            print_to_stderr("no tracks with that language in " + path)
            continue

        # build command line
        cmd = [MKVMERGE, "-o", path + ".temp"]
        if len(audio_lang):
            cmd += ["--audio-tracks", ",".join([str(a['id']) for a in audio_lang])]
            for i in range(len(audio_lang)):
                cmd += ["--default-track", ":".join([str(audio_lang[i]['id']), "0" if i else "1"])]
        if len(subtitle_lang):
            cmd += ["--subtitle-tracks", ",".join([str(s['id']) for s in subtitle_lang])]
            for i in range(len(subtitle_lang)):
                cmd += ["--default-track", ":".join([str(subtitle_lang[i]['id']), "0"])]
        cmd += [path]

        # process file
        print_to_stderr("Processing " + path + "...")
        mkvmerge = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout, stderr = mkvmerge.communicate()
        if mkvmerge.returncode != 0:
            print_to_stderr("Failed")
            print(stdout)
            continue

        print_to_stderr("Succeeded")

        # overwrite file
        os.remove(path)  # Don't overwrite
        os.rename(path + ".temp", path)

@seantcanavan
Copy link

@majora2007 what are the licensing terms of this gist?

@vidarak what are the licensing terms for your edit of @majora2007's gist?

@vidarak
Copy link

vidarak commented Nov 9, 2022

free to use for whatever. no terms.

@majora2007
Copy link
Author

Licensing is free to use as well. There was no license on the code i found floating around.

@seantcanavan
Copy link

thank you for the fast responses. according to https://stackoverflow.com/a/71817350 you're supposed to ask the authors if none is provided

@i0moe
Copy link

i0moe commented Nov 20, 2022

i get this error when i run it how can i fix it?

moe@moeserver:/hdd/TEST$ python3 removeNonEnglish.py /hdd/TEST
Traceback (most recent call last):
File "/hdd/TEST/removeNonEnglish.py", line 90, in
print >> sys.stderr, "Processing", path, "...",
TypeError: unsupported operand type(s) for >>: 'builtin_function_or_method' and '_io.TextIOWrapper'. Did you mean "print(, file=<output_stream>)"?
moe@moeserver:/hdd/TEST$

@majora2007
Copy link
Author

@i0moe that error means the code still has some python 2 and the print statement needs to be rewritten to use python 3 syntax print().

@vidarak
Copy link

vidarak commented Nov 20, 2022

@i0moe It's been a while, but that's probably fixed in my modified script which worked for me on python 3.9?

@conign
Copy link

conign commented Mar 29, 2023

@majora2007 would you please update this script to keep multiple audio/subtitles tracks in a specified language?

@seantcanavan
Copy link

seantcanavan commented Mar 29, 2023

@majora2007 would you please update this script to keep multiple audio/subtitles tracks in a specified language?

Check this line to reconfigure the languages to keep: LANG = ["eng","nor"]

If you need to differentiate between subtitle and audio languages you can check my fork of the file at https://github.com/seantcanavan/apex-plex/blob/main/main.py

The lines in question are:

# change this for other languages (3 character code)
AUDIO_LANG = ["eng", "jpn"]
SUBTITLE_LANG = ["eng"]

@tawmoto
Copy link

tawmoto commented May 30, 2023

Unfortunately it does not work on python 3.11

  File "script.py", line 73, in <module>
    if len(audio_lang) == 0 and len(subtitle_lang) == 0:
       ^^^^^^^^^^^^^^^
TypeError: object of type 'filter' has no len()

@lodesmets
Copy link

lodesmets commented Feb 1, 2024

Hey,
It doesn't seem to work, os.walk(in_dir): seems to return nothing, all my mkv files are in 'Z:\Season 1' folder
But the program doesn't find any files

It seems it is a problem if the files are on a network drive

@seantcanavan
Copy link

Hey, It doesn't seem to work, os.walk(in_dir): seems to return nothing, all my mkv files are in 'Z:\Season 1' folder But the program doesn't find any files

It seems it is a problem if the files are on a network drive

There are a lot of issues in general when working across operating systems and file systems over the network.

I spent hours one time trying to debug something and I was running Python on Windows but accessing files on a Linux shared drive. Windows is not case sensitive and linux is and it was a huge learning moment for me.

I would encourage you to try running the script on the same platform as your network drive or dig further into the exact file systems you're running on both devices and why you might be seeing an issue that you are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment