Skip to content

Instantly share code, notes, and snippets.

@eggplants
Last active August 3, 2024 15:34
Generate .srt file and translate into different language

Generate .srt file and translate into different language

Environment

$ uname -vorm
6.8.0-38-generic #38-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun  7 15:25:01 UTC 2024 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble
$ python -V
Python 3.12.3

Requirement

pipx install yt-dlp whisper-translate2 subt
sudo apt install ffmpeg

Run

# Download sample video (zoo.mp4)
yt-dlp 'https://www.youtube.com/watch?v=jNQXAC9IVRw' -o 'zoo.%(ext)s'

# Generate srt file from video (zoo.srt)
whisper-ctranslate2 zoo.mp4

# Translate subtitle from English into Japanese (zoo.translated.srt)
subt 'zoo.srt' -d ja

# Embed subtitle to video (zoo.translated_1.mp4)
ffmpeg -i zoo.mp4 -i zoo.translated.srt -c copy -c:s mov_text zoo.translated_1.mp4

# Add subtitle to video as a text (zoo.translated_2.mp4)
ffmpeg -i zoo.mp4 -vf subtitles=zoo.translated.srt zoo.translated_2.mp4

Result

zoo.srt

1
00:00:00,000 --> 00:00:05,000
Alright, so here we are, one of the elephants.

2
00:00:05,000 --> 00:00:13,000
The cool thing about these guys is that they have really, really, really long trunks.

3
00:00:13,000 --> 00:00:16,000
And that's cool.

4
00:00:16,000 --> 00:00:19,000
And that's pretty much all there is to say.

zoo.translated.srt

1
00:00:00,000 --> 00:00:05,000
さて、ここに私たちは象の一人です。

2
00:00:05,000 --> 00:00:13,000
これらの人のクールなことは、彼らが本当に、本当に、本当に長い幹を持っているということです。

3
00:00:13,000 --> 00:00:16,000
そして、それはクールです。

4
00:00:16,000 --> 00:00:19,000
そして、それはほとんどすべてです。

zoo.translated_1.mp4

zoo.translated_1.mp4

With the video player supported for subtitle, you can see like this:

Screencast.from.2024-07-28.20-26-23.webm

zoo.translated_2.mp4

zoo.translated_2.mp4
@eggplants
Copy link
Author

yt-dlp "https://x.com/anadoluajansi/status/1819472080540241949" -o turkish.mp4
whisper-ctranslate2 --language=tr turkish.mp4
subt -s tr -d ja turkish.srt
ffmpeg -i turkish.mp4 -vf subtitles=turkish.translated.srt turkish.translated.mp4
turkish.translated.mp4

@eggplants
Copy link
Author

Real time: 27.710.s

  • 91.71s user 5.99s system 352% cpu 27.710 total
time (
  yt-dlp "https://x.com/anadoluajansi/status/1819472080540241949" -o turkish.mp4
  whisper-ctranslate2 --language=tr turkish.mp4
  subt -s tr -d ja turkish.srt
  ffmpeg -i turkish.mp4 -vf subtitles=turkish.translated.srt turkish.translated.mp4
)
[twitter] Extracting URL: https://x.com/anadoluajansi/status/1819472080540241949
[twitter] 1819472080540241949: Downloading guest token
[twitter] 1819472080540241949: Downloading GraphQL JSON
[twitter] 1819472080540241949: Downloading m3u8 information
[info] 1819471932779040768: Downloading 1 format(s): hls-542+hls-audio-128000-Audio
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: turkish.fhls-542.mp4
[download] 100% of    2.83MiB in 00:00:00 at 9.20MiB/s
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: turkish.fhls-audio-128000-Audio.mp4
[download] 100% of  902.65KiB in 00:00:00 at 3.01MiB/s
[Merger] Merging formats into "turkish.mp4"
Deleting original file turkish.fhls-audio-128000-Audio.mp4 (pass -k to keep)
Deleting original file turkish.fhls-542.mp4 (pass -k to keep)
Detected language 'Turkish' with probability 1.000000
[00:00.000 --> 00:04.720]  Benim için önemli olan insanların gönlünü kazanmak bu ama bakıyorum bir sürü yorumlar falan var.
[00:04.720 --> 00:08.360]  Olumlu yorumlarda var. Dünyada bayağı bir ilk sıralaya geçmişiz.
[00:08.360 --> 00:14.400]  Hadi Türkiye neyse de Amerika'da, Çin'de, Japonya'da bir Koreli bir kızla karşılaştırmışlar.
[00:14.400 --> 00:17.440]  İşte olimpiyatın en cool iki sporcusu.
[00:17.440 --> 00:20.240]  Sonra bir oyun kahramanı yapmışlar.
[00:20.240 --> 00:23.920]  Bayağı enteresan. Yani gözlüğü falan çok şey yapmışlar.
[00:23.920 --> 00:26.080]  Pozisyonu gündeme almışlar.
[00:26.240 --> 00:29.240]  O Elon Musk bile yazmış yani.
[00:29.240 --> 00:31.960]  Çok güzel bir t-shirt basmışlar evet.
[00:31.960 --> 00:36.800]  Tabii bunlar güzel şeyler. İlk başta kendime ülkem adına çok mutlu oruyorum.
[00:36.800 --> 00:38.960]  Bayağı güzel şeyler yazılmış yani.
[00:56.080 --> 00:59.800]  Altyazı M.K.
Transcription results written to '/home/eggplants/Videos/test' directory
Saved: './turkish.translated.srt'
ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)
  configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/
x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-
libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfr
eetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmy
sofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-li
bsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio -
-enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-
ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --e
nable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --e
nable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'turkish.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.16.100
  Duration: 00:00:56.90, start: 0.000000, bitrate: 550 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 720x1280, 416 kb/s, SAR 1:1 DAR 9:16, 25 fps, 2
5 tbr, 2500k tbn (default)
    Metadata:
      handler_name    : Twitter-vork muxer
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : Twitter-vork muxer
      vendor_id       : [0][0][0][0]
[Parsed_subtitles_0 @ 0x5e593b68ff40] libass API version: 0x1701000
[Parsed_subtitles_0 @ 0x5e593b68ff40] libass source: tarball: 0.17.1
[Parsed_subtitles_0 @ 0x5e593b68ff40] Shaper: FriBidi 1.0.13 (SIMPLE) HarfBuzz-ng 8.3.0 (COMPLEX)
[Parsed_subtitles_0 @ 0x5e593b68ff40] Using font provider fontconfig
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
[Parsed_subtitles_0 @ 0x5e593b68bac0] libass API version: 0x1701000
[Parsed_subtitles_0 @ 0x5e593b68bac0] libass source: tarball: 0.17.1
[Parsed_subtitles_0 @ 0x5e593b68bac0] Shaper: FriBidi 1.0.13 (SIMPLE) HarfBuzz-ng 8.3.0 (COMPLEX)
[Parsed_subtitles_0 @ 0x5e593b68bac0] Using font provider fontconfig
[Parsed_subtitles_0 @ 0x5e593b68bac0] fontselect: (Arial, 400, 0) -> /usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf, 0, LiberationSans
[Parsed_subtitles_0 @ 0x5e593b68bac0] Glyph 0x3053 not found, selecting one more font for (Arial, 400, 0)
[Parsed_subtitles_0 @ 0x5e593b68bac0] fontselect: (Arial, 400, 0) -> /usr/share/fonts/opentype/NotoSansCJKjp/NotoSansCJKjp-Regular.otf, 0, NotoSansCJKjp-Regular
[libx264 @ 0x5e593b601740] using SAR=1/1
[libx264 @ 0x5e593b601740] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 @ 0x5e593b601740] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0x5e593b601740] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'turkish.translated.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.16.100
  Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 720x1280 [SAR 1:1 DAR 9:16], q=2-31, 25 fps, 12800 tbn (default)
    Metadata:
      handler_name    : Twitter-vork muxer
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.31.102 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : Twitter-vork muxer
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.31.102 aac
[out#0/mp4 @ 0x5e593b54f100] video:5130kB audio:960kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.712482%
frame= 1421 fps=121 q=-1.0 Lsize=    6133kB time=00:00:56.87 bitrate= 883.3kbits/s speed=4.84x
[libx264 @ 0x5e593b601740] frame I:7     Avg QP:17.09  size: 67405
[libx264 @ 0x5e593b601740] frame P:569   Avg QP:20.42  size:  7119
[libx264 @ 0x5e593b601740] frame B:845   Avg QP:23.44  size:   863
[libx264 @ 0x5e593b601740] consecutive B-frames:  3.2% 50.7%  5.9% 40.3%
[libx264 @ 0x5e593b601740] mb I  I16..4: 15.5% 59.6% 24.9%
[libx264 @ 0x5e593b601740] mb P  I16..4:  1.4%  4.1%  0.7%  P16..4: 18.0%  5.5%  2.6%  0.0%  0.0%    skip:67.8%
[libx264 @ 0x5e593b601740] mb B  I16..4:  0.1%  0.2%  0.0%  B16..8:  8.5%  0.8%  0.1%  direct: 0.3%  skip:90.1%  L0:40.1% L1:54.3% BI: 5.6%
[libx264 @ 0x5e593b601740] 8x8 transform intra:65.1% inter:72.2%
[libx264 @ 0x5e593b601740] coded y,uvDC,uvAC intra: 40.6% 48.9% 17.7% inter: 3.6% 4.3% 0.2%
[libx264 @ 0x5e593b601740] i16 v,h,dc,p: 35% 36% 10% 19%
[libx264 @ 0x5e593b601740] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 23% 25%  4%  5%  4%  6%  4%  4%
[libx264 @ 0x5e593b601740] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 24% 11%  6%  8%  7%  9%  6%  6%
[libx264 @ 0x5e593b601740] i8c dc,h,v,p: 55% 22% 18%  6%
[libx264 @ 0x5e593b601740] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x5e593b601740] ref P L0: 63.8% 12.7% 18.6%  4.9%
[libx264 @ 0x5e593b601740] ref B L0: 83.7% 13.8%  2.5%
[libx264 @ 0x5e593b601740] ref B L1: 98.2%  1.8%
[libx264 @ 0x5e593b601740] kb/s:739.19
[aac @ 0x5e593bdf8f40] Qavg: 630.193
( yt-dlp "https://x.com/anadoluajansi/status/1819472080540241949" -o ;   ;  -)  91.71s user 5.99s system 352% cpu 27.710 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment