Skip to content

Instantly share code, notes, and snippets.

@jerieljan
Created April 11, 2023 08:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jerieljan/ada80745046b345901a796aa259997d1 to your computer and use it in GitHub Desktop.
Save jerieljan/ada80745046b345901a796aa259997d1 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
###
# whisper-cpp - a shell script that easily creates subtitles out of movie files and muxes them.
#
# Requirements
# - ffmpeg
# - whisper.cpp
# - Make sure to run https://github.com/ggerganov/whisper.cpp#quick-start first!
#
# Usage:
# - Set up whisper.cpp - clone and install a model (ggml-large) and make sure `main` is working
# - Edit WHISPER_CPP_DIR below to match your whisper.cpp directory.
# - Adjust your THREAD_COUNT according to your CPU specs.
# - Run this script with the following params:
# `whisper-cpp <movie-file> <lang_code>`
# - After running the script, you should receive a copy of the file with subtitles muxed in.
###
# #0. Initialization and configuration
FILE="${1}"
LANG="${2:=en}"
THREAD_COUNT=8
WORKING_DIR=`pwd`
WHISPER_CPP_DIR="/path/to/whisper.cpp"
# Use basename to just get the filename.
FILENAME=$(basename -- "${FILE}")
# If you need only the "name" and none of the parts after the first dot, use this.
# Ex: sample.tar.gz = sample and .tar.gz
DOTS_FILENAME="${FILENAME%%.*}"
DOTS_EXTENSION="${FILENAME#*.}"
# If you need the correct filename and extension, use this.
# Ex: sample.tar.gz = sample.tar and .gz
PROPER_EXTENSION="${FILENAME##*.}"
PROPER_FILENAME="${FILENAME%.*}"
UNIXTIME=`date +%s`
NEW_FILENAME="${PREFIX}${UNIXTIME}-${RANDOM}.${DOTS_EXTENSION}"
# Always execute as if you're in the location of the script file.
# cd "$(dirname "$0")"
# Halt the script when errors occur.
set -e
# #1. Extract waveform from input"
FILE_WAV="${DOTS_FILENAME}.wav"
echo "⚙️ Processing :: ${FILENAME}"
echo "⚙️ Extracting wav :: ${FILE_WAV}"
ffmpeg -i "${FILE}" \
-c:v none -ar 16000 \
"${FILE_WAV}"
# #2. Run OpenAI Whisper
FILE_SRT="${DOTS_FILENAME}.srt"
echo "⚙️ Running Whisper.cpp on :: ${FILE_WAV} -> ${FILE_SRT}"
${WHISPER_CPP_DIR}/main -m ${WHISPER_CPP_DIR}/models/ggml-large.bin \
-f "${FILE_WAV}" \
-osrt -t "${THREAD_COUNT}" -l "${LANG}"
mv "${FILE_WAV}.srt" "${FILE_SRT}"
# #3. Mux SRT to File
FILE_SUBBED="${DOTS_FILENAME} [💬].mkv"
echo "⚙️ Muxing files :: ${FILE_SUBBED}"
ffmpeg -i "${FILE}" \
-f srt -i "${FILE_SRT}" \
-map 0:v -map 0:a -map 1:s \
-c:v copy -c:a copy -c:s copy \
"${FILE_SUBBED}"
# #4. Cleanup
rm "${FILE_WAV}"
# rm "${FILE_SRT}" # you'll want to keep the SRT file if you want to add caption tracks separately in YouTube.
echo "Done! Check out ${FILE_SUBBED}"
@jerieljan
Copy link
Author

A lot of the lines at the start are extra since I always use these variables in my personal scripts. I'll remove them when I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment