Skip to content

Instantly share code, notes, and snippets.

@bathtime
Last active December 26, 2023 22:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bathtime/6ec36e47bb4a104bd648dc1abcaaccc7 to your computer and use it in GitHub Desktop.
Save bathtime/6ec36e47bb4a104bd648dc1abcaaccc7 to your computer and use it in GitHub Desktop.
srtssa.sh
#!/bin/bash
usage() {
echo "Version 0.15.1, GNU GPLv3
*** WARNING: ALPHA VERSION CODE!!! THIS PROGRAM EATS HAMSTERS! ***
This program takes an .srt, .txt, .pdf, or .epub file, translates it, and merges both translations
into an .ssa or .txt file in a parallel manner, allowing both subtitles to be viewed at the same time.
This can be helpful for people learning a new language.
Usage: ./srtssa.sh [OPTIONS]... [FILE]
Options:
-s, --source <lang> language of input file (ex., en, fr, ru...) (default: 'auto')
-t, --target <lang> language to translate to (default: [-lang1|-lang2])
make translation blank: 'blank'
just copy the source language: 'copy'
to omit altogether, set to 'blank' and add the flag '-delblank'
-i, --input <file> input file (ex., .srt, .pdf, .txt)
-o, --output <file> output file (ex., .ssa, .ass, .txt)
-sx, --sext <ext> source file extension (default: txt)
-tx, --text <ext> target file extension (default: txt)
-a, --above <lang> place this language above the other (default: [source])
-l1,--lang1 <lang> 1st preferred target language (default: 'en')
-l2,--lang2 <lang> 2nd preferred target language (default: 'fr')
-t1,--top1 <lang> 1st preferred language to be placed on top (default: 'en'|source)
-t2,--top2 <lang> 2nd preferred language to be placed on top (default: 'fr'|target)
-m, --mode <mode> options: srt, text, epub (default: source file's extension or 'text')
Interface:
-h, --help this help display
-g, --gtk use graphical interface (requires zenity)
-q, --quiet only display progress, errors, and final result (default)
-z, --stealth prevent all information from being printed to screen (except errors)
-v, --verbose display formatting information
-d, --debug prints 'verbose' + raw text elements for debugging.
-x, --xbug prints as --debug but also pipes source lines through 'less'
Advanced Options (use at your own risk!):
-w, --wait <0-60> secods to wait between translations (default: '12')
--counting [file|logic] label line numbers according to the srt file's number or logical count.
This is useful when lines are skipped or with misnumbered srt files
--alt use alternate translation engine (translate-shell app required)
--dif [0,1-1000] difference in length between the source and target text (in %)
which triggers a re-translation (default: '200'[%], '0'=off)
--mincutoff [0-1000] them minimum characters allowed else the line is erased [TODO]
--noerr remove error detection
--norepair do not retranslate on error
--makesubs make an ssa file from txt, pdf, epub... (times will all be set to 0)
--insert insert translated text directly below
--nosource exclude source language from output file
--justformat exit after formatting
--log appends errors to a log file: [source].log
--keeptmp don't delete temporary files
-ch, --chars [100-20000] max characters sent per translation (default: '5000')
-e, --engine [engine] allows for the user to add their own translation engine.
%s = source lang, %t = target lang, %x = text to be translated
ex., ./strssa.sh -ch 1000 -engine 'trans -s %s -t %s -b %x' file.srt
Text Manipulation:
--clean *** EXPERIMENTAL *** break text down into simple sentences
--initclean same as '--clean' but cleans before formatting lines
not recommended for use with srt files!
--cleantrans same as '--clean' but cleans translated text
--clean-newline [text] specifies how '--clean' will handle new lines (default '\n\n')
--newline [text] change the newlines into another character (default ' ')
--backnew [text] change backslashed newline from one character to another (default '\\n')
--transnew [text] insert another character for newline text to be translated (default ' ')
--onelineeach insert another line after each newline
--linespaced [0-10] dermines how many concurrent newlines are allowed (--onelineeach forces this to 1)
--brackreturn push bracketed text to a new line
--delblank delete results with no text
--deluni delete any unicode (ex., \u003c)
--deltag delete tags (ex., <i>...</i>)
--removeline [text] remove any line containing this text
--simpleclean inserts two newlines after each comma, period, or semi-colon
--commareturn inserts two newlines after each comma
Epub Options:
--w3mepub render epub files with 'w3m' (must be installed)
--html2text render epub files with 'html2text' (must be installed)
--removepage attempt to remove page numbers from text
--raw keep return carriages and other potentially bad characters
--epub-newline1 [text] determines how lines are joined at '<p' tags (default '\n\n')
only works with default engine!
--epub-newline2 [text] determines how lines are joined at '<' tags (default ' ')
only works with default engine!
Pdf Options:
--mupdf render pdf files with 'mutool' ('mupdf-tools' be installed)
to render pdf files by default 'poppler-utils' must be installed
Misc Options:
--calibre render several types of files with 'ebook-convert' (calibre must be installed)
--pandoc render several types of files with 'pandoc' (must be installed)
Example Usage:
$ ./srtssa.sh -s en -t fr -a fr -f movie.srt
The above example translates English to a merged (English and French) .ssa file with English displaying at the top of the screen and French the bottom."
}
interface=5
tar_pref1="en"
tar_pref2="fr"
top_pref1="fr"
top_pref2="en"
linespaced=2
makesubs=0
onelineeach=0
wait=10
charMax=5000
dif=200
w3mepub=0
raw=0
keeptmp=0
notrans=0
counting='logic'
nosource=0
noerr=0
newline=' '
backnew='\\n'
transnew=' '
clean=0
initclean=1
cleantrans=0
clean_newline='\n\n'
brackreturn=0
epub_newline1='\n\n'
epub_newline2=''
removepage=0
html2text=0
calibre=0
mupdf=0
pandoc=0
norepair=0
deluni=0
deltag=0
delblank=0
removeline=''
commareturn=1
simpleclean=0
justformat=0
gtk=0
alt=1
mincutoff=1
abort() { echo -e "$@ Aborting."; exit 3; }
bye() { if [ "$interface" -ge 1 ]; then echo -e "\nGoodbye... "; fi; rm -rf "$gui_file"; if [ "$keeptmp" != "1" ]; then rm -rf "$tmp_file";fi; }
filterOpts() { echo "$@" | perl -CS -pe 's#^(?=\S)#--#g;'; }
if [ "$1" = "" ] && [ "$gtk" -eq 1 ] ; then
src_file=$(zenity --text='Please choose a source file to translate' --file-selection --multiple --file-filter=*.*)
[ -z "$src_file" ] && abort 'No source file selected.'
langS=$(zenity --list --editable --title="Choose Source Language" --column="Language" en fr auto)
[ -z "$langS" ] && langS='-s auto' || langS='-s '$langS
langT=$(zenity --list --editable --height=240 --title="Choose Target Language" --column="Language" en fr auto copy blank)
[ -z "$langT" ] && langT='-t auto' || langT='-t '$langT
cleanOpts=$(filterOpts $(zenity --list --checklist --editable --height=350 --separator=' --' --title="Choose Cleanup Options" --column="Type" --column="Cleanup:" FALSE initclean FALSE clean FALSE cleantrans FALSE bracketreturn TRUE delblank TRUE deluni FALSE deltag FALSE removepage))
newlineOpts=$(filterOpts $(zenity --list --checklist --editable --height=750 --separator=' --' --title="Choose New Line Options" --column="Type" --column="Cleanup:" TRUE 'newline " "' FALSE 'newline "\n"' FALSE 'newline "\n\n"' FALSE 'newline "\\n"' FALSE 'newline ""' TRUE 'clean-newline "\n\n"' FALSE 'clean-newline "\n"' FALSE 'clean-newline "\n\n"' FALSE 'clean-newline "\\n"' FALSE 'clean-newline ""' TRUE 'backnew "\\n"' FALSE 'backnew "\n"' FALSE 'backnew " "' FALSE 'backnew ""' TRUE 'transnew " "' FALSE 'transnew "\n"' FALSE 'transnew "\\n"' FALSE 'transnew ""' FALSE 'epub-newline1 "\n"' TRUE 'epub-newline1 "\n\n"' FALSE 'epub-newline1 "\n"' FALSE 'epub-newline1 "\\n"' FALSE 'epub-newline1 ""' FALSE 'epub-newline2 "\n"' FALSE 'epub-newline2 "\n\n"' FALSE 'epub-newline2 "\n"' FALSE 'epub-newline2 "\\n"' TRUE "epub-newline2 ' '" ))
engineOpts=$(filterOpts $(zenity --list --checklist --height=550 --editable --separator=' --' --title="Choose Engine Options" --column="Type" --column="Cleanup:" FALSE alt FALSE noerr FALSE norepair FALSE insert FALSE nosource FALSE justformat FALSE log FALSE keeptmp FALSE w3mepub FALSE html2text FALSE raw FALSE mupdf FALSE calibre FALSE pandoc))
options=$langS' '$langT' '$cleanOpts' '$newlineOpts' '$engineOpts
[ "$interface" -ge 4 ] && echo "Options chosen:"$options
set -- $options "${1%.*}"
preset=1
fi
#export PERL_UNICODE=SDL
## Format text for increased readibility
clean() {
## … = 2026, « = 00AB, # » = 00BB,— = 2014
text=$(LC_ALL=C echo "$1" | LC_ALL=C perl -CS -pe 's/ {2,}/ /g; \
s#(?<=[:|;])\s?\)?\s?(?!\s?\))#$&'"${clean_newline}"'#g; \
s#\.{3,}#...#g; s#\N{U+2026}\.{1,}#\N{U+2026}#g; \
s#\.{1,}\N{U+2026}#\N{U+2026}#g; s#\.{3,}#...#g;\
s#(?<!^^)(?<!\S)(?<!\d)(\-|\N{U+2014})#'"${clean_newline}"'$&#g; \
s#(?<=\,)\s(?=\N{U+2014}|\"\S)#'"${clean_newline}"'#g; \ ## ,— ,"
s#(?<!^^)(?<!ddd)\N{U+00AB}#'"${clean_newline}"'\N{U+00AB}#g; \
s#(?<=(?:^\.|\N{U+2026}))\s?(?=\-|\N{U+2014})#$1'"${clean_newline}"'#g;\ ## Em dash: .—
s#(?<!\N{U+2014}\N{U+2026})(?<!\N{U+2014}\s\N{U+2026})(?<=([\N{U+2026}|\.|\?|\!|\"|\N{U+00BB}]))\s(?![\!|\?|\:|\;|\)|\"|\s|\N{U+00BB}|\.])#'"${clean_newline}"'#g; \
s#(?<=(?:\N{U+00BB}|\?|\!|\.))(?![\d|\N{U+00BB}|\!|\?|\"|\)|\s|\.|\:|\;|\N{U+2026}])#'"${clean_newline}"'#g;')
## Brackets on a new line:
if [ "$brackreturn" -eq 1 ]; then
text=$(LC_ALL=C echo "$text" | LC_ALL=C perl -CS -pe 's/ {2,}/ /g; \
s#(?<!^^)(?=\s+\(+\s?)\(+#'"${clean_newline}"'$&#g; \
s#(?<=[^\N{U+2014}]([\N{U+2026}|\.|\?|\!|\"|\)|\N{U+00BB}]))\s(?![\!|\?|\)|\;|\:|\"|\s|\N{U+00BB}|\.])#'"${clean_newline}"'#g; \
s#(?<=(?:\N{U+00BB}|\?|\!|\.|\)))(?![\d|\N{U+00BB}|\!|\?|\"|\)|\s|\;|\:|\.|\N{U+2026}])#'"${clean_newline}"'#g;')
fi
[ "$removepage" -eq 1 ] && text=$(echo "$text" | LC_ALL=C perl -CS -pe 's#^\d+##; s#\d+$##g; ')
## Remove excess spaces and blank characters
text=$(LC_ALL=C echo "$text" | LC_ALL=C perl -CS -pe 's#\N{U+00A0}##g; s# +# #g; s#^ ##g; s# $##g;')
echo "$text"
}
translation() {
text=$(LC_ALL=C echo "$@" | LC_ALL=C sed -r 's#(\#|\&|\*)##g; s#(\\n|\n)#'"${transnew}"'#g;')
#LC_ALL=C text=$(LC_ALL=C echo "$@" | LC_ALL=C sed -r 's#(\#|\&|\*)##g; s#(\\n|\n)#'"${transnew}"'#g; s#–#-#g; s#…#...#g;')
#text="$@"
if [ "$tar_lang" = "copy" ]; then
LC_ALL=C echo "$text"
elif [ "$tar_lang" = "blank" ]; then
:
elif [ -n "$alt" ]; then
LC_ALL=C ./trans -s "$src_lang" -t "$tar_lang" -b "$text"
elif [ -n "$engine" ]; then
run="$engine"
run=${run/\%s/$src_lang}
run=${run/\%t/$tar_lang}
run=${run/\%x/'"$@"'}
LC_ALL=C eval "$run" | sed -r 's/[\#|\&|\*]//g; s#u200b# #g;'
else
LC_ALL=C wget -U "Mozilla/5.0" -q -O- "http://translate.googleapis.com/translate_a/single?client=gtx&sl=$src_lang&tl=$tar_lang&dt=t&q=$text" | perl -CS -X -lne 'push @a,/(?<!\,\[\[?)\[\"(.*?)(?<!\\)\"/g;END{print "@a" }' | LC_ALL=C perl -CS -pwe 's/\N{U+005C}\N{U+0022}\s?/\N{U+0022}/g;' | LC_ALL=C sed -r ' s#\\u200b##g;'
fi
}
## Needed to decode line numbers from Roman Numerals when translating Latin
roman() {
input=$@
output=""
len=${#input}
roman_val() {
N=$1
one=$2
five=$3
ten=$4
out=""
case $N in
0) out+="" ;;
[123]) while [[ $N -gt 0 ]]; do
out+="$one"
N=$(($N-1))
done ;;
4) out+="$one$five" ;;
5) out+="$five" ;;
[678]) out+="$five"
N=$(($N-5))
while [[ $N -gt 0 ]]; do
out+="$one"
N=$(($N-1))
done ;;
9) while [[ $N -lt 10 ]]; do
out+="$one"
N=$(($N+1))
done
out+="$ten" ;;
esac
echo $out
}
while [[ $len -gt 0 ]]
do
num=${input:0:1}
case $len in
1) output+="$(roman_val $num I V X)" ;;
2) output+="$(roman_val $num X L C)" ;;
3) output+="$(roman_val $num C D M)" ;;
4) output+="$(roman_val $num M ${U}V${R} ${U}X${R})" ;;
*) num=${input:0:(-3)}
while [[ $num -gt 0 ]]; do
output+="M"
num=$(($num-1))
done ;;
esac
input=${input:1} ; len=${#input}
done
echo $output
}
trap 'bye' EXIT
eval set -- $(getopt -a -n st2ssa -o i:xo:w:a:s:ndzghvqt:e:x:m: --long log,sx:,mode:,tx:,sext:,text:,dif:,noerr,notrans,engine:,alt,ch:,chars:,l1,l2,gtk,stealth,verbose,lang1:,lang2:,tl1:,tl2:,top1,top2,help,chunks:,source:,target:,output:,above:,w3mepub,wait,raw,delblank,counting:,keeptmp,nosource,transnew:,nl:,newline:,backnew:,insert,xbug,brackret,clean,initclean,clean-newline:,cleantrans,justformat,epub-newline1:,epub-newline2:,removepage,html2text,calibre,mupdf,pandoc,norepair,deluni,deltag,quiet,input:,removeline:,makesubs,onelineeach,simpleclean,commareturn,mincutoff:,linespaced: -- "$@")
while :; do
case "$1" in
-a | --above) top=$2; shift 2 ;;
-l1 | --lang1) tar_pref1=$2; shift 2 ;;
-l2 | --lang2) tar_pref2=$2; shift 2 ;;
-t1 | --top1) top_pref1=$2; shift 2 ;;
-t2 | --top2) top_pref2=$2; shift 2 ;;
--ch | --chars) charMax=$2; shift 2 ;;
--dif) dif=$2; shift 2 ;;
-w | --wait) wait=$2; shift 2 ;;
-i | --input) src_file="$2"; shift 2 ;;
-o | --output) out_file="$2"; shift 2 ;;
-s | --source) src_lang=$2; shift 2 ;;
-t | --target) tar_lang=$2; shift 2 ;;
-h | --help) usage; exit ;;
-g | --gtk) gtk=1;
[ -z "$(command -v zenity)" ] && abort 'zenity not installed.';
shift 1 ;;
-z | --stealth) interface="0"; shift 1 ;;
--insert) insert="1"; shift 1 ;;
-q | --quiet) interface="1"; shift 1 ;;
-v | --verbose) interface="2"; shift 1 ;;
-d | --debug) interface="3"; shift 1 ;;
-x | --xbug) interface="4"; shift 1 ;;
--makesubs) makesubs="1"; shift 1 ;;
--onelineeach) onelineeach="1"; shift 1 ;;
--counting) counting="$2"; shift 2 ;;
--nl | --newline) newline="$2"; shift 2 ;;
--backnew) backnew="$2"; shift 2 ;;
--linespaced) linespaced="$2"; shift 2 ;;
--clean-newline) clean_newline="$2"; shift 2 ;;
--epub-newline1) epub_newline1="$2"; shift 2 ;;
--epub-newline2) epub_newline2="$2"; shift 2 ;;
--removeline) removeline="$2"; shift 2 ;;
--mincutoff) mincutoff="$2"; shift 2 ;;
--simpleclean) simpleclean="1"; shift 1 ;;
--commareturn) commareturn="1"; shift 1 ;;
--w3mepub) w3mepub="1";
[ -z "$(command -v w3m)" ] && abort 'w3m not installed.';
shift 1 ;;
--transnew) transnew="$2"; shift 2 ;;
--raw) raw="1"; shift 1 ;;
--delblank) delblank="1"; shift 1 ;;
--deltag) deltag="1"; shift 1 ;;
--brackreturn) brackreturn="1"; shift 1 ;;
--clean) clean="1"; shift 1 ;;
--initclean) initclean="1"; shift 1 ;;
--cleantrans) cleantrans="1"; shift 1 ;;
--html2text) html2text="1"; shift 1 ;;
--calibre) calibre="1"; shift 1 ;;
--deluni) deluni="1"; shift 1 ;;
--mupdf) mupdf="1"; shift 1 ;;
--pandoc) pandoc="1"; shift 1 ;;
--alt) alt=1; shift 1 ;;
--removepage) removepage=1; shift 1 ;;
-e | --engine) engine="$2"; shift 2 ;;
-sx | --sext) src_ext=$2; shift 2 ;;
-tx | --text) tar_ext=$2; shift 2 ;;
-m | --mode) mode=$2; shift 2 ;;
--log) log=1; shift 1 ;;
--justformat) justformat=1; shift 1 ;;
--keeptmp) keeptmp=1; shift 1 ;;
--noerr) noerr=1; shift 1 ;;
--norepair) norepair=1; shift 1 ;;
--nosource) nosource=1; shift 1 ;;
--notrans) notrans=1; tar_lang="none"; shift 1 ;;
--) shift; break ;;
esac
done
## '$@' returns an empty qoute for the file name, hence the need for the preset variable
[ -z "$src_file" ] && [ -z "$preset" ] && src_file="$@"
line=1
chunk=0
chunkNum=0
chunkTot=0
if [ -z "$src_file" ]; then
case $mode in
pdf) src_ext="pdf" ;;
text) src_ext="txt" ;;
epub) src_ext="epub" ;;
srt) src_ext="srt" ;;
*) src_ext="*" ;;
esac
## Gui interface prompt for source file
if [ "$gtk" -eq 1 ]; then
gui_file=$(pwd)"/tmp.$(date +"%m%d%H%M%S").tmp"
touch "$gui_file"
CURRENT_PID=$$
src_file=$(zenity --text='Please choose a source file to translate' --file-selection --multiple --file-filter=*.$src_ext)
[ -z "$src_file" ] && abort 'No source file selected.'
src_lang="$(zenity --text='Please choose source language:' --title='Source language:' --ok-label='Translate' --list --editable --column="Language" --height=225 --extra-button='auto' "en" "fr" "ru")"
[ -z "$src_lang" ] && abort 'No source language chosen.'
else
echo -e "Pick an .srt file to translate:\n"
select fname in *.$src_ext; do
src_file="$fname"; break
done
fi
[ -z "$src_file" ] && abort 'No source file selected.'
fi
[ ! -f "$src_file" ] && abort 'File '\'$src_file\'' does not exist.'
src_ext="${src_file##*.}"
if [ -z "$mode" ]; then
case $src_ext in
srt) mode="srt" ;;
epub) mode="epub" ;;
pdf) mode="pdf" ;;
txt) mode="text" ;;
*) mode="text" ;;
esac
fi
if [ -z "$tar_ext" ]; then
if [ "$mode" = "srt" ]; then
tar_ext="ssa";
elif [ "$makesubs" -eq 1 ]; then
tar_ext="srt"
else
tar_ext="txt"
fi
fi
if [ -z "$out_file" ]; then
out_file=$(echo "$src_file" | sed 's/\.'"${src_ext}"'/\ (new).'"${tar_ext}"'/g')
echo "New output file: "$out_file
fi
## Rendering engines
if [ "$calibre" -eq 1 ]; then
[ -z "$(command -v ebook-convert)" ] && abort 'calibre not installed.'
ebook-convert "$src_file" "$out_file"
src_file="$out_file"
elif [ "$mupdf" -eq 1 ]; then
[ -z "$(command -v mutool)" ] && abort 'mutool not installed.'
mutool convert -o "$out_file" "$src_file"
src_file="$out_file"
elif [ "$pandoc" -eq 1 ]; then
[ -z "$(command -v pandoc)" ] && abort 'pandoc not installed.'
pandoc -t rst "$src_file" -o "$out_file"
src_file="$out_file"
elif [ "$mode" = "pdf" ]; then
[ -z "$(command -v pdftotext)" ] && abort 'pdftotext not installed.'
pdftotext -eol unix -nopgbrk "$src_file" "$out_file"
src_file="$out_file"
elif [ "$mode" = "epub" ]; then
rm -rf tmp "$out_file"
unzip -d tmp "$src_file" > /dev/null 2>&1
text=''
files="$(find tmp/ -type f -name "*.xhtml" -o -name "*.html" | sort -V)"
[ "$interface" -ge 3 ] && echo -e "\nEpub [x]html files:\n\n\033[0;32m$files\033[0m"
## Long file names with spaces need to be converted to be processed properly
for file in $(echo "$files" | sed -s 's/ /+/g; s/:/ /g'); do
## Add spaces back to file name
file=$(echo "$file" | sed 's/+/ /g')
[ "$interface" -ge 4 ] && debug=$debug$(cat "$file")
if [ "$w3mepub" -eq 1 ]; then
text=$text$(w3m "$file" -dump)
elif [ "$html2text" -eq 1 ]; then
text=$text$(html2text --ignore-emphasis "$file")
else
## Decode html to UTF-8
LC_ALL=C perl -i -MHTML::Entities -0777 -CSDA -ne 'print decode_entities($_)' "$file"
tmptxt=$(cat "$file")
text=$text$(echo "$tmptxt" \
| perl -CS -ple 's#<\N{U+002F}?(a|b|u|i|span)>##g; s#<a.*?>##g;'\
| perl -CS -0777 -ne 'push @a,print "$&'"${epub_newline1}"'" while /<p(.*?)<\/p>/gs' \
| perl -CS -0777 -ne 'push @a,print "$1'"${epub_newline2}"'" while />(.*?)</gs' | tr '\0' '\n')
fi
done
[ "$interface" -ge 4 ] && echo "$text" | less
[ "$raw" -eq 0 ] && text=$(echo "$text" | perl -CS -ple 's#(\x0|\000}|\r)##g; s#^\s$##g; ')
echo "$text" > "$out_file"
src_file="$out_file"
[ "$keeptmp" -ne 1 ] && rm -rf tmp
[ "$interface" -ge 4 ] && echo "$debug" | less
[ "$interface" -ge 4 ] && cat "$out_file" | less
fi
tmp_file="$out_file.$(date +"%m%d%H%M%S").tmp"
if ([ "$tar_lang" = "" ] || [ "$tar_lang" = "auto" ]) && [ "$notrans" -eq 0 ] && [ "$gtk" -eq 1 ]; then
tar_lang="$(zenity --text='Please choose a target language:' --title='Target language:' --extra-button='auto' --ok-label='Translate' --list --editable --column="Language" --height=225 "en" "fr" "ru")"
[ -z "$tar_lang" ] && abort 'No target language selected.'
(while [ -f "$gui_file" ]; do cat "$gui_file"; sleep .5; done | (zenity --title='Press 'X' to cancel' --text='Translating...' --progress --percentage=0 --auto-kill --time-remaining --auto-close --cancel-label='backround' || kill $CURRENT_PID; rm -rf "$gui_file"))&
fi
t_file="$(file -bi "$src_file" | awk -F'=' '{print $2'})"
if [ "$t_file" = "utf-8" ]; then
iconv -f "utf-8" "$src_file" -o "$tmp_file"
#elif [ "$t_file" = "unknown-8bit" ]; then
# cp "$src_file" "$tmp_file"
else
LC_ALL=C perl -CS -pwe '' "$src_file" > "$tmp_file"
fi
[ "$interface" -ge 3 ] && echo -e "\nInitial settings:\nSource lang: \033[0;32m$src_lang \033[0mTarget lang: \033[0;35m$tar_lang\033[0m top: \033[0;33m$top\033[0m bottom: \033[0;33m$bot\033[0m Mode: \033[0;33m$mode\033[0m File type: \033[0;33m$t_file\033[0m\n"
## Automatically find source language
if [ "$src_lang" = "" ] || [ "$src_lang" = "auto" ]; then
## Grab text from the middle of the file
middle=$(( $(wc -l < "$src_file") / 2 ))
if [ "$mode" = "srt" ]; then
text=$(tail -n $middle "$src_file" | head -n 20 | sed -n -r '/(-->|[0-9])/,${//!p;}' | tr '\r\n' ' ')
else
text=$(tail -n $middle "$src_file" | head -n 10 | tr '\r\n' ' ')
fi
[ "$interface" -ge 3 ] && echo -e "Grabbing text from line \033[0;35m$middle\033[0m for language detection:\n\n\033[0;32m\"$text\"\033[0m\n"
src_lang=$(wget -U "Mozilla/5.0" -q -O- "http://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl=fr&dt=t&q=$text")
src_lang=${src_lang##*\[\"}; src_lang=${src_lang%%\"*}
fi
if [ "$tar_lang" = "auto" ] || [ "$tar_lang" = "" ]; then
if [ "$src_lang" = "$tar_pref1" ]; then tar_lang=$tar_pref2; else tar_lang=$tar_pref1; fi
fi
if [ -z "$top" ]; then
if [ "$src_lang" = "$top_pref1" ]; then
top=$src_lang; bot=$tar_lang
elif [ "$tar_lang" = "$top_pref1" ]; then
top=$tar_lang; bot=$src_lang
elif [ "$src_lang" = "$top_pref2" ]; then
top=$src_lang; bot=$tar_lang
elif [ "$tar_lang" = "$top_pref2" ]; then
top=$tar_lang; bot=$src_lang
else
top=$src_lang; bot=$tar_lang
fi
elif [ "$top" = "$src_lang" ]; then bot=$tar_lang; else bot=$src_lang; fi
t_file="$(file -bi "$tmp_file" | awk -F'=' '{print $2'})"
[ "$interface" -ge 3 ] && echo -e "\nFinal settings:\nSource lang: \033[0;32m$src_lang \033[0mTarget lang: \033[0;35m$tar_lang\033[0m top: \033[0;33m$top\033[0m bottom: \033[0;33m$bot\033[0m Mode: \033[0;33m$mode\033[0m File type: \033[0;33m$t_file\033[0m\n"
linespaced=1
[ "$onelineeach" -eq 1 ] && LC_ALL=C perl -CS -i -pe 's`\n`\n\n`g;' "$tmp_file" && linespaced=1
[ "$simpleclean" -eq 1 ] && LC_ALL=C perl -CS -i -pe 's#[\,|\.|\;|\:|\?|\!]#$&\n\n#g;' "$tmp_file" && linespaced=1
[ "$commareturn" -eq 1 ] && LC_ALL=C perl -CS -i -pe 's#\,#$&\n\n#g;' "$tmp_file" && LC_ALL=C perl -CS -i -pe 's#^ ##g;' "$tmp_file" && linespaced=1
LC_ALL=C perl -i -CS -ane 's(\r|`)//g; s/\.{2,}/.../g; $n=(@F==0) ? $n+1 : 0; print if $n<='"${linespaced}"'' "$tmp_file"
[ "$interface" -ge 1 ] && echo -e -n "\rFormatting..."
if [ "$initclean" -eq 1 ]; then
text=$(cat "$tmp_file")
text=$(clean "$text")
echo "$text" > "$tmp_file"
fi
[ "$removeline" != "" ] && LC_ALL=C perl -CSDA -ni -e 'print unless /'"${removeline}"'/;' "$tmp_file"
[ "$interface" -ge 4 ] && cat "$tmp_file" | less
textlines=$(wc -l < "$tmp_file")
info="[Script Info]\n\
ScriptType: v4.00+\n\
Collisions: Normal\n\
PlayDepth: 0\n
Timer: 100,0000\n\
Video Aspect Ratio: 0\n\
WrapStyle: 0\n\
ScaledBorderAndShadow: no\n\
\n\
[V4+ Styles]\n\
Format: Name,Fontname,Fontsize,PrimaryColour,SecondaryColour,OutlineColour,BackColour,Bold,Italic,Underline,StrikeOut,ScaleX,ScaleY,Spacing,Angle,BorderStyle,Outline,Shadow,Alignment,MarginL,MarginR,MarginV,Encoding\n\
Style: $top,Arial,10,&H00F9FFFF,&H00FFFFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,1,0,8,10,10,10,0\n\
Style: $bot,Arial,18,&H00F9FFF9,&H00FFFFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,2,0,2,10,10,10,0\n\
\n\
[Events]\n\
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text"
if [ "$mode" = "srt" ]; then
[ "$interface" -ge 4 ] && sdebug="$info"
echo -e $info > "$out_file"
else
echo -n > "$out_file"
fi
while IFS='' read -r lineN; do
lines=$(( lines + 1 ))
## Scroll down until text is found
[ "$lineN" = "" ] && while IFS='' read -r lineN; [ "$lineN" = "" ]; do lines=$(( lines + 1 )) ; done
if [ "$mode" = "srt" ]; then
IFS='' read -r lineT; IFS='' read -r lineS
s=$lineS
lines=$(( lines + 2 ))
else
lines=$(( lines + 1 ))
s=$lineN
fi
## Grab the other lines if more than 1
while IFS='' read -r lineS; [ -n "$lineS" ]; do
s=$s$newline$lineS
lines=$(( lines + 1 ))
done
s=$(echo "$s" | sed -r 's#\\n#'"${backnew}"'#g;')
[ "$deltag" -eq 1 ] && s=$(echo "$s" | LC_ALL=C perl -CA -pe 's#</?\S?>?##g;')
[ "$clean" -eq 1 ] && s=$(clean "$s")
if [ "$s" = "" ] && [ "$delblank" -eq 1 ]; then
[ "$interface" -ge 2 ] && echo "*** Blank line omitted ***"
if [ "$counting" = "file" ]; then
line=$(( line + 1 ))
chunkNum=$(( chunkNum + 1 ))
fi
else
if [ "$mode" = "srt" ]; then
## Grab and incorperate timeline info. This shell command is faster than awk, sed, and cut
timeS="Dialogue: 0,"${lineT:1:7}"."${lineT:9:2}","${lineT:18:7}"."${lineT:26:2}","
if [ "$nosource" -eq 1 ]; then
final=$timeS$top",,0000,0000,0000,,(($line))"
else
if [ "$top" = "$src_lang" ]; then
final1=$timeS$top",,0000,0000,0000,,"$s
final2=$timeS$bot",,0000,0000,0000,,(($line))"
else
final1=$timeS$top",,0000,0000,0000,,(($line))"
final2=$timeS$bot",,0000,0000,0000,,"$s
fi
fi
elif [ "$makesubs" -eq 1 ]; then
final1=$s
else
if [ "$nosource" -eq 0 ]; then
if [ "$top" = "$src_lang" ]; then
final1=$s
final2="(($line))"
else
final1="(($line))"
final2=$s
fi
else
final1="(($line))"
final2=""
fi
fi
if [ "$mode" = "srt" ] && [ "$makesubs" -ne 1 ]; then
[ "$interface" -ge 4 ] && sdebug="$sdebug$final1\n$final2\n"
printf '%s\n%s\n%s' "$final1" "$final2" >> "$out_file"
elif [ "$makesubs" -eq 1 ]; then
printf '%s\n00:00:00,000 --> 00:00:00,000\n%s\n\n' "$line" "$final1" >> "$out_file"
else
[ "$interface" -ge 4 ] && sdebug="$sdebug$final1\n\n$final2\n\n"
printf '%s\n\n%s\n\n\n' "$final1" "$final2" >> "$out_file"
fi
[ "$interface" -ge 1 ] && echo -n -e "Completed: \033[0;33m"$(( 100 - ( ($textlines * 1) / ($lines) ) ))"%\033[0m \r"
if [ $(( ${#srcClump[chunkTot]} + ${#s} )) -gt $charMax ]; then
srcClump[$chunkTot]="${srcClump[$chunkTot]} (($line)) "
srcClumpNum[$chunkTot]=$(( $line - 1 ))
chunkNum=0
chunkTot=$(( chunkTot + 1 ))
fi
srcClump[$chunkTot]="${srcClump[$chunkTot]} (($line)) $s"
srcLine[$line]="$s"
line=$(( line + 1 ))
chunkNum=$(( chunkNum + 1 ))
fi
done < "$tmp_file"
srcClumpNum[$chunkTot]=$(( $line - 1 ))
srcClump[$chunkTot]="${srcClump[$chunkTot]} (($line)) "
[ "$interface" -ge 4 ] && echo -e "$sdebug" | less
[ "$justformat" -eq 1 ] && exit
if [ "$makesubs" -eq 1 ]; then
if [ "$makesubs" -eq 1 ]; then
echo -e "\n\nRun:\n"
video_file="${src_file/$src_ext/mp4}"
srt_file="${src_file/$src_ext/ssa}"
new_video="${video_file/./ (sub).}"
echo "gnome-subtitles '$out_file' '$video_file'"
echo "srtssa.sh -d '$out_file'"
echo "ffmpeg -i '${src_file/$src_ext/mp4}' -vf ass='$srt_file' '$new_video'"
fi
exit
fi
[ "$interface" -ge 1 ] && printf "\rTranslating...%-50s"
line=1
transTime=$(( $charMax / 3000 )) # How long does it take to pull the results from Google and Process them?
for (( chunkNum=0; chunkNum<=$chunkTot; chunkNum++ )); do
s=$(translation "${srcClump[chunkNum]}")
#s="${srcClump[chunkNum]}"
## Check and fix badly formatted brackets around line numbers
fixed=$(echo "$s" | perl -CA -pe '\
s#\(\s?\(\s?\(\s?(?=\d)# ((#g; \ ## (((10)) to ((10))
s#(# \(#g; s#)#\) #g; \
s#(\( \(|\(\( |\(\( |\( \( )(?=\d+)#\(\(#g; s#(?<!\()(\(|\( )(?=\d+)#\(\(#g; \
s#(?<=\d{1})(\) \)| \)\)| \)\)| \) \))#\)\)#g; s#(?<=\d{1})((?!\)\)) \)|(?!\)\))\))#\)\)#g; \
s#(?<=\S)\(\(# ((#g; s#\)\)(?=\S)#)) #g;')
## Latin translations will substitute decimal numbers for roman numerals, they need to be translated back
if [ "$tar_lang" = 'la' ]; then
for ((num=$line; num<=${srcClumpNum[chunkNum]}+1; num++ )); do
roman=$(roman "$num")
fixed=$(echo "$fixed" | sed 's#(('"${roman}"'))#(('"${num}"'))#g')
done
fi
[ "$interface" -ge 3 ] && echo -e "\n\nSource:\n\n\033[0;32m${srcClump[$chunkNum]}\033[0m\n\nTranslation:\n\n\033[0;35m$s\n\n\033[0mFixed Translation:\n\n\033[0;36m$fixed\n\n\033[0m"
for (( num1=$line; num1<=${srcClumpNum[chunkNum]}; num1++ )); do
num2=$(( num1 + 1 ))
new_line=$(echo "$fixed" | perl -CSDA -sae 'push @a,/(?<='"${num1}"'\)\) )(.*?)(?= \(\('"${num2}"')/; END{print @a }' | sed -e 's/^[[:space:]]*//g')
[ "$deluni" -eq 1 ] && new_line=$(echo "$new_line" | LC_ALL=C perl -CA -pe 's/\\u\S{4,}//g;')
echo "$new_line" | LC_ALL=C perl -CA -ne 'push @a,print "\nWARNING: UTF encoding found: $&\n" while /\\u\S{4}.*$/gs;'
new_line=$(echo "$new_line" | LC_ALL=C perl -CSDA -pe 's/\\u003c/</g; \
s/\\u003e/>/g; \
s/\s?(?<=(b|i|u))>\s?/>/g; \
s#\s?<\s?(?=/?(b|i|u))#<#g;')
[ "$cleantrans" -eq 1 ] && newline=$(clean "$newline")
## Check for errors
if [ "$noerr" -eq "0" ]; then
## Check for empty lines, brackets, or less than 3 characters-signs of a bad translation
echo "$new_line" | LC_ALL=C sed -n -r '/^$/{q1}; /^(\.|\?|\,)[^\.{2,}]/{q4}; / P /{q6}; /-->/{q7}; /\\u/{q8};'
#echo "$new_line" | LC_ALL=C sed -n -r '/^$/{q1}; /^(\.|\?|\,)[^\.{2,}]/{q4}; / P /{q6}; /-->/{q7}; '
error=$?
## Compare number of l&r brackets in both src and tar. If unequal, Google likely made a mistake
ls=${srcLine[num1]//[^(]}
rs=${srcLine[num1]//[^)]}
lt=${new_line//[^(]}
rt=${new_line//[^)]}
[ ${#lt} -ne ${#ls} ] || [ ${#rt} -ne ${#rs} ] && error=3
## Compare the difference in line lengths
a=${#new_line};b=${#srcLine[num1]}
difference=$((a*b*dif?(a>b?100*a/b:100*b/a)-100:0))
if [ "$difference" -gt "$dif" ]; then
c=$(echo "$new_line" | LC_ALL=C perl -CS -ne 'push @a,print 1 while /^\S+$/gs;')
d=$(echo "${srcLine[num1]}" | LC_ALL=C perl -CS -ne 'push @a,print 1 while /^\S+$/gs;')
## It's not an error if there was only one word for both source and target text
## One word was simply longer than the other and we can assume this is normal
if [ "$c" != "1" ] || [ "$d" != "1" ] || [ "$a" -le 2 ]; then error=5; fi
fi
if [ "$error" -ne 0 ]; then
old_line=$new_line
[ "$norepair" -eq 0 ] && new_line=$(translation "${srcLine[$num1]}")
[ "$cleantrans" -eq 1 ] && newline=$(clean "$newline")
if [ -n "$log" ]; then
log="\n\nFile: $src_file\n\nSource:\n\n${srcClump[$chunkNum]}\n\nTranslation:\n\n$s\n\nFixed Translation:\n\n$fixed\n\n(ERROR: $error) $num1 = \"${srcLine[$num1]}\"\n(ERROR: $error) $num1 ≠ \"$old_line\"\n(ERROR: $error) $num1 ≈ \"$new_line\"\n\n"
[ ! -f "$log_file" ] && log_file="${out_file/.$tar_ext/.log}" && touch "$log_file"
echo -e "$log" >> "$log_file"
echo -e "Err: $error\nFile: '$out_file'\n${srcLine[$num1]}\n$old_line\n$new_line\n\n" >> 'all.log'
fi
[ "$interface" -ge 2 ] && echo -e "\n\033[0;31m(ERROR: $error)\033[0m $num1 = \"\033[0;32m${srcLine[$num1]}\033[0m\"\n\033[0;31m(ERROR: $error)\033[0m $num1 ≠ \"\033[0;31m$old_line\033[0m\"\n\033[0;31m(ERROR: $error)\033[0m $num1 ≈ \"\033[0;35m$new_line\033[0m\"\n"
errors=$(( errors + 1 ))
if [ "$norepair" -eq 0 ]; then
for (( i=$wait; i>0; i-- )); do
[ "$interface" -ge 2 ] && echo -n -e "Waiting \033[0;33m$i\033[0m seconds... \r"
sleep 1
done
printf "\r%-50s\r"
fi
fi
fi
[ "$interface" -ge 2 ] && echo -e "$num1 = \"\033[0;32m${srcLine[$num1]}\033[0m\"\n$num1 ≈ \"\033[0;35m$new_line\033[0m\""
if [ "$new_line" = "" ] && [ "$delblank" -eq 1 ]; then
LC_ALL=C perl -CS -ni -e 'print unless `\(\("$num1"\)\)`;' "$out_file"
else
LC_ALL=C perl -CS -i -pe 's`\(\('"${num1}"'\)\)`'"${new_line}"'`g;' "$out_file"
fi
line=$(( line + 1 ))
done
## Don't make user wait after all lines are translated
if [ "$chunkNum" -lt "$chunkTot" ]; then
for (( i=$wait; i>0; i-- )); do
perc=$(( (($chunkNum) * 100 + ( 100 - ($i * 100 / $wait))) / ($chunkTot) ))
seconds=$(( ($chunkTot - $chunkNum - 1 ) * ($wait + $transTime + ( $chunkTot / 100 )) + $i + $transTime ))
timeleft=$(date -d@$seconds -u +%H:%M:%S)
[ "$interface" -ge 1 ] && echo -e -n "\r[ $out_file ] \033[0;32m$src_lang\033[0m -> \033[0;35m$tar_lang\033[0m $timeleft (\033[0;33m$perc%\033[0m) " && [ $errors ] && echo -e -n "\033[0;31mErrors: $errors\033[0m"
[ "$gtk" -eq 1 ] && echo -e $perc > "$gui_file"
sleep 1
done
[ "$interface" -ge 1 ] && printf "\rTranslating...%-80s"
fi
done
if [ "$interface" -ge 1 ]; then [ "$errors" ] && echo -e "\n$out_file done with ERRORS: $errors\n" || echo -e "\n$src_file done. \n"; fi
@bathtime
Copy link
Author

bathtime commented May 9, 2021

@cheaster35

Sorry, I accidentally deleted the entire gist when trying to update the file. I re-uploaded a completely updated version.

If srtssa will not translate, you may need to run it with an alternative translator such as translate-shell (this can happen on Debian, as the perl version is inadequate):
https://github.com/soimort/translate-shell

Use the parameter '--alt' to force srtssa to automatically use this translator, but make sure that the translator is in the same directory as the srtssa program is in.

Then try running srtssa to translate 'myfile.txt' (whatever file you want) to mynewfile.txt in French:
$ ./srtssa.sh -t fr --alt -i myfile.txt -o mynewfile.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment