Skip to content

Instantly share code, notes, and snippets.

@kowalcj0
Last active July 30, 2023 10:09
Show Gist options
  • Save kowalcj0/ae0bdc43018e2718fb75290079b8839a to your computer and use it in GitHub Desktop.
Save kowalcj0/ae0bdc43018e2718fb75290079b8839a to your computer and use it in GitHub Desktop.
Extract all subtitles from a movie using ffprobe & ffmpeg
alias subs=subs
function subs() {
movie="${1}"
filename="${1%.*}"
mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
OLDIFS=$IFS
IFS=,
( while read idx lang
do
echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" "${filename}_${lang}_${idx}.srt"
done <<< "${mappings}" )
IFS=$OLDIFS
}
@dausruddin
Copy link

Thank you. I don't know why this do nothing with my system tho. Perhaps something changed in my bash version. Or perhaps I am dumb enough not knowing how to run this script.
Anyway, I had to modify it into this to make it work.

function subs() {
    movie="${1}"
    filename="${1%.*}"
    mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
    OLDIFS=$IFS
    IFS=,
    ( while read idx lang
    do
        echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
        ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" "${filename}_${lang}_${idx}.srt"
    done <<< "${mappings}" )
    IFS=$OLDIFS
}

subs "$1"

Example of command: bash subs.sh Movie-1080p.mkv

@kowalcj0
Copy link
Author

I just tested it with bash 5.1.0 and ffmpeg 4.4 and it worked just fine.

Can you try to do it manually?

Start with getting mapping of subtitle indexes to languages:

mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "Movie-1080p.mkv"
echo ${mappings}

It should show something like 3,eng.

Then, substitute ${idx} & ${lang} with mapping values, e.g.:

ffmpeg -nostdin -hide_banner -loglevel quiet -i "Movie-1080p.mkv" -map 0:"${idx}" "Movie-1080p_${lang}_${idx}.srt"

If it works, then you'll need to wrap it in a function or a script that works with your shell.

@otanim
Copy link

otanim commented Aug 15, 2021

@kowalcj0 I think it's happening simply because your version of defining the function and the usage of alias is happening within the scope of ./subs.sh script, while @dausruddin's version is correct for most of the cases due to its directly taking an argument passed from outside of the scope of the script: ./subs.sh input.mkv.

I tested this using bash v5.0.7 and ffmpeg v4.2.4. Perhaps you're running your version differently, or your bash's version is allowing you to define aliases outside of scope and you're able to run the command via an alias.

@kowalcj0
Copy link
Author

I just realised that the gist's filename was wrong. Usually, I have this function and alias defined in my .bashrc.

Nevertheless, I tried it as a standalone bash script subs.sh which looked like this:

#!/usr/bin/env bash

function subs() {
    movie="${1}"
    filename="${1%.*}"
    mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
    OLDIFS=$IFS
    IFS=,
    ( while read idx lang
    do
        echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
        ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" "${filename}_${lang}_${idx}.srt"
    done <<< "${mappings}" )
    IFS=$OLDIFS
}

subs "${1}"

and calling ~/tmp/subs.sh movie.mkv or bash ~/tmp/subs.sh movie.mkv worked just fine.

I've tested it with: bash v5.1.0 & ffmpeg v4.4

@otanim
Copy link

otanim commented Aug 16, 2021

yeah, it's working in that way 😄 Btw, thanks for the code snippet 👍

@anhthoai
Copy link

This script only extracts latest subtitle one, not all

@otanim
Copy link

otanim commented Aug 30, 2021

@anhthoai, it does extract all subtitles for my case, would suggest you debug the code if mappings variable is returning more than 1 subtitle data, or just post the result here, so others can assist you with your case.

@anhthoai
Copy link

subs
Hello, I have 8 subtitles as screenshot but only extract latest one

@otanim
Copy link

otanim commented Aug 30, 2021

@anhthoai, by observing your screenshot, I can see you're using Windows 10 OS, please note that the shell scripts are intended to run under Unix systems (Linux, macOS), not sure how exactly you executed that script under your environment, but that might be the case.

On another note, it does seem like that in your case the looping logic that should go through the array is not completed correctly and it printed only the latest value in the list.

I would suggest you confirm the following:

@anhthoai
Copy link

Hello,

  • I don't modify any the script
  • Even I use Git bash, it has same result
    image

@otanim
Copy link

otanim commented Aug 30, 2021

@anhtheoai, I don't remember any text being printed in the script with the labeling subtitle #N from FILENAME, can you show your subs.sh file's content?

Btw, have you tried extracting subs from a different .mkv file?

@anhthoai
Copy link

anhthoai commented Aug 30, 2021

Hello,
This is script

#!/usr/bin/env bash

function subs() {
movie="${1}"
filename="${1%.*}"
mappings=ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"
OLDIFS=$IFS
IFS=,
( while read idx lang
do
echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" "${filename}${lang}${idx}.srt"
done <<< "${mappings}" )
IFS=$OLDIFS
}

subs "${1}"

image

@otanim
Copy link

otanim commented Aug 30, 2021

@anhthoai, I see...

Not sure about that... The only thought that comes to mind is to try extracting subs on the Linux system and see if the issue repeats or not, or try to upgrade your ffmpeg's version to the latest release or try to extract a different file's subs.

@IsraelOgas
Copy link

@anhthoai did you find a solution to extract with this script all subtitles in Windows 10? I tried to debug the code with my null knowledge in bash and I realized that it only recognizes the last subtitle (lang) and when creating the file with the lang included, the file isnt created, so what I did was to use only the index in the file name and now I can extract them all. If someone can explain why that happens, I would appreciate it a lot.

image

@anhthoai
Copy link

anhthoai commented Dec 4, 2021

Any help to extract lang in file name? index is useless really.

@kowalcj0
Copy link
Author

Hi @anhthoai
To debug, this issue, I'd suggest to start with printing out the mappings before the while loop.
Simply, type: echo ${mappings} before the OLDIFS=$IFS line, and re-run the script.
If you're going to see something like: 2,eng 3,fre 4,rus it means that the video file has the language tags correctly set and the index (idx) to language mappings are properly defined.

If you're going to see something different, then please post your output.

The other thing, that I'd check is the format of the embedded subtitles.
You can display it with ffprobe:

ffprobe -loglevel error -select_streams s -show_entries stream=codec_name,index:stream_tags=language -of csv=p=0 "your-video.mkv"
2,subrip,eng
3,subrip,fre
4,subrip,rus

If you're going to see the subtitle format other than subrip then you might need to change the extension of the extracted subtitle from ${idx}.srt to:

  • ${idx}.ass for ass (also know as SubStationAlpha)
  • ${idx}.pgs for hdmv_pgs_subtitle
  • etc

@flabdablet
Copy link

If an input file has more than one embedded subtitle, you can save a lot of time by building a command line that makes a single invocation of ffmpeg extract all of them in one pass.

function subs() {
  local movie idx lang subs=
  local -a dests=()
  for movie
  do
    ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "$movie" |
    {
      while IFS=, read idx lang
      do
        subs+=" ${lang}_$idx"
        dests+=(-map "0:$idx" "${movie%.*}_${lang}_$idx.srt")
      done
      if test -n "$subs"
      then
        echo "Extracting subtitles from $movie:$subs"
        ffmpeg -nostdin -y -hide_banner -loglevel quiet -i "$movie" "${dests[@]}"
      else
        echo "No subtitles in $movie"
      fi
    }
  done
}

Rather than have the function process only its first argument, there's an outer for loop that walks through all of them so you can pass multiple filenames explicitly and/or use wildcards.

The list of available subtitle streams is piped straight from the output of ffprobe into the while read loop that parses them, rather than being collected into a shell variable in between. This requires that the subs and dests[] variables that the loop fills in are then used within the same brace-delimited compound statement as the loop itself: the shell runs pipeline components in parallel, each in its own subshell. If it were only the actual while loop that had the ffprobe output piped into it, like

ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "$movie" |
while IFS=, read idx lang
do
  subs+=" ${lang}_$idx"
  dests+=(-map "0:$idx" "${movie%.*}_${lang}_$idx.srt")
done

then the loop's subshell would terminate with the loop, and the changes made to subs and dests[] inside the loop would be lost. Note also the use of environment variable prefix syntax to supply a modified IFS to a single command only (the read builtin, in this instance), making it clearer why IFS is being modified and removing the need for an explicit save and restore.

Building dests[] as an array with each of what will become ffmpeg arguments as an individual element, rather than simply appending pieces to a single string as is done with subs, allows those arguments to be cleanly expanded into the ffmpeg command line as separate words even if they contain whitespace or special shell characters. Trying to do this kind of thing in a strictly POSIX-compatible shell that doesn't support arrays is always a nightmare of horrible edge cases. Just don't.

@ripsnortntear
Copy link

ripsnortntear commented Jul 30, 2023

The issue im having is this producing 0kb sub files.

#!/usr/bin/env bash

function subs() {
  local movie idx lang subs=
  local -a dests=()
  for movie
  do
    ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "$movie" |
    {
      while IFS=, read idx lang
      do
        subs+=" ${lang}_$idx"
        dests+=(-map "0:$idx" "${movie%.*}_${lang}_$idx.srt")
      done
      if test -n "$subs"
      then
        echo "Extracting subtitles from $movie:$subs"
        ffmpeg -nostdin -y -hide_banner -loglevel quiet -i "$movie" "${dests[@]}"
      else
        echo "No subtitles in $movie"
      fi
    }
  done
}

subs "${1}"

Screenshot 2023-07-30 060839

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment