ffmpeg is a command line utility that presents a API to interacting with a variety of media types/encodings in a uniform fashion.
Depending on the
ffmpeg distribiont, you may get access to utilities such as
ffprobe (which provides information on a file) and
ffplay (will play back a file). Those tools are critical.
Those tools, by default, will show all the arguments that
ffmpeg was compiled with, which can get a little verbose. If you're going to run many ffmpeg commands, I suggest you get used to passing the
While you can often do just about everything media file related with
ffmpeg, that does not mean you should; I would recommend installing
sox, a command line utility that primarily deals with audio files, it's nowhere as feature complete but the arguments are a lot easier to remember.
Often uses of
ffmpeg are sort of last-resort-ish... we do not like modifying our data in
/dat/corpora, while some audio transformations can be done w/o losing any data, doing so can cause other issues. When feasible, it's best to leave the files alone and have us modify our tools accordingly.
I'm going to start this bit with a rant. While I am perfectly good with non-sensical names for things, please take care to name things that are relatively easy to search for. If you name an audio encoding "shorten", please consider the head ache you will cause having people googling "shorten audio file" or something along those lines.
Early in the Fred development, I got reports th at Fred was unable to read one of the test audio files from the toolkit. The explanation I got was the file was a NIST encoding. I thought this was odd, as the audio library I use to interact with audio files (
libsndfile) does support NIST files. So what gives? What makes this file so special?
ognyan@wfh:~$ ffprobe -hide_banner -i testSample.wav Input #0, nistsphere, from 'testSample.wav': Metadata: microphone : Sennheiser recording_site : SRI database_id : wsj1 database_version: 1.0 recording_environment: quiet speaker_session_number: 01 session_utterance_number: 03 prompt_id : adapt.03 utterance_id : 44aa0103 speaking_mode : read-adaptation speaker_id : 44a sample_min : -854 sample_max : 683 sample_checksum : 63835 recording_date : 02-Dec-1992 recording_time : 12:45:30.00 Duration: 00:00:05.41, bitrate: 87 kb/s Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16p
As you can see,
ffprobe gives all sorts of good information, but this bottom bit is the bit of interest:
Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16p
I had to do a fair amount of googling, to discover shorten is an encoding scheme that was used ages ago. Our toolkit and wave package has support for it (which is why this file works in SView), but Fred does not.
I created an issue with
libsndfile to support it; which they said they're open to a PR but given how nobody (but us) uses shorten encoded audio files, they aren't particularly motivated.
ffmpeg is able to play back this audio file via
ffplay command with ease. A future potential feature I'll roll out down the line is embed ffmpeg into Fred to handle reading of data.
I came across this issue almost on accident as I was added to an email chain about corrupt audio files. Only a handful of the files were corrupt, which struck me as somewhat odd.
First step, on macOS, there is a command line utility,
ognyan@wfh:~$ afplay ambiance_20200412_21h.wav
This immediately exits, no audio playing. Given this, as the email report saying our tools were not handling this file, clearly there was something wrong.
ognyan@wfh:~$ ffplay -hide_banner ambiance_20200412_21h.wav
Here we can hear the audio, so what gives?
ognyan@wfh:~$ ffprobe -hide_banner ambiance_20200412_21h.wav [wav @ 0x7fe3a3808200] Estimating duration from bitrate, this may be inaccurate Input #0, wav, from 'ambiance_20200412_21h.wav': Duration: 01:08:55.68, bitrate: 256 kb/s Stream #0:0: Audio: pcm_s16le ( / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
So it has this warning about the duration...
Doing a quick google search turned me to
sox command that can patch it up
ognyan@wfh:~$ sox --ignore-length ambiance_20200412_21h.wav fixed.wav
After this, we can verify we don't get the warning
ognyan@wfh:~$ ffprobe -hide_banner fixed.wav Input #0, wav, from 'fixed.wav': Duration: 01:08:55.68, bitrate: 256 kb/s Stream #0:0: Audio: pcm_s16le ( / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
I have one four channel file, can I get four one channel files?
Saw a message from Grace requesting this late at night, there was some email chains already going around regarding 4 channel audio files, but when asking for the path to the file, the first thing I do, make sure it has actually 4 channels:
ognyan@wfh:~$ ffprobe -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav Input #0, wav, from '200629_prePV_SET_Max_Noise_micin.wav': Duration: 00:15:36.00, bitrate: 1024 kb/s Stream #0:0: Audio: pcm_s16le ( / 0x0001), 16000 Hz, 4 channels, s16, 1024 kb/s
yup...4 channels... who does that... anyway
To split it up, this is pretty easy with
ffmpeg, still had to google the command. Before I show you what I did, I want to reference this awesome stack overflow post:
ognyan@wfh:~$ ffmpeg -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav \ -map_channel 0.0.0 first.wav \ -map_channel 0.0.1 second.wav \ -map_channel 0.0.2 third.wav \ -map_channel 0.0.3 fourth.wav
This generates 4 files,
fourth.wav, each file corresponding to which audio channel there was.
The link above showcases a few other ways of handling this as well.
what is an amr file?
This isn't really a request, but this is an issue brought up about having some files but there were in "amr" format.
After getting the path to a file:
ognyan@wfh:~$ ffplay -hide_banner -i 2_26134184.amr
Sure enough, audio plays fine... now Stan did not request this, but let's say we want to convert the file into some kind of usable wav file.
ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr file.wav
You can get fancy, and specify different sample rates if you want or convert to other kind of fancy loss-less audio encoding
ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr c:a libopus file.ogg
Here, we said to use the
libopus audio codec which supports the
ogg type audio files.
I have a cute video of my cats playing...but I have a bazillion of these videos and I'd like to squish it down some...
ognyan@wfh:~$ ffmpeg -i original.MOV \ -c:v libx265 \ -crf 28 \ -c:a copy \ first.mp4
Let's say, I know I'm going to play this on a chromecast, which does have native x265 decoding capability, but I'm going to want to enable some flags to make the video a bit easier to playback
ognyan@wfh:~$ ffmpeg -i original.MOV \ -c:v libx265 \ -crf 28 \ -level 4.1 \ -tune fastdecode \ -c:a copy \ second.mp4
Now let's say, I have a fair amount of time to encode this video, but I really want to compress the file size.
ognyan@wfh:~$ ffmpeg -i original.MOV \ -c:v libx265 \ -preset veryslow \ -crf 36 \ -level 4.1 \ -tune fastdecode \ -c:a copy \ third.mp4
Making awful Star Wars DVD Rip Usable
Sorry, don't have the original on hand, so you'll have to bear with my explanations.
I have a Star Wars Theatrical Release DVD from China (from ~20 years ago). While the movie on it certainly worked, there were some very significant problems regarding playback.
- Hard-coded vertical black bars, while the video was in a widescreen format, the people who made it decided to force the video into a 4:3 aspect ratio by adding black bars above and below the video. When you played the video on a 16:9 screen (current standard) you would get black bars on the sides; the end result being you would have a very thick black border all the way around the video, taking up a majority of the pixels on the screen
- The subtitles were in VOB format, which are effectively bitmaps. On high resolution screens they look like rubish, and for media playback software such as Plex, they make transcoding substantially more difficult
- The video encdoding was something ancient (maybe MPEG1?), which is not only horribly inefficient, but a lot of modern video players (VNC) would display many artifacts throughout playback, and skipping forward/behind would be a problematic operations as well.
First thing to address was getting rid of those black bars.
ognyan@wfh:~$ ffmpeg -i input.mkv -vf cropdetect=24:16:0 dummy.mkv ... [Parsed_cropdetect_0 @ 0x3704360] x1:0 x2:639 y1:43 y2:317 w:640 h:272 x:0 y:46 pts:181320 t:181.320000 crop=640:272:0:46 ...
Here the bit we care about is
ognyan@wfh:~$ ffmpeg -i Star\ Wars\ -\ Episode\ VI\ -\ Return\ of\ the\ Jedi.mkv \ -vf crop=704:272:8:104 \ -aspect 704:272 \ -c:v libx264 \ -crf 17 \ -c:a copy \ -profile:v high \ -preset medium \ -tune fastdecode \ -tune film \ -tune grain \ -level 4.1 \ -movflags +faststart \ output.mkv
in this command, I do the following operations:
- crop the video (to a different value as this is a different video)
- set the
704:272figuring this would guarantee that the video would not be stretched innapropriately in either direction
- re-encode the video to
libx264(original does not matter)
- I set the "constant rate factor" (
crf) to 17, which should make the output indistinguishable from the source material (
crfof 0 is technically lossless, but you would get huge file output sizes)
-c:a copymeans copy the audio and do nothing to do it...
-profile:v highthis is a
h264specific encoding parameter, that restricts which filters are applied to the video. Without this argument, the video is compatible with the largest variety of devices, but virtually all modern phones, tablets, video-players, etc support the
highprofile setting, so this allows for slightly smaller file-sizes as a result
-preset mediumWe discussed, the preset settings have to do with encoding, the idea is that for a slower preset, the quality of the result will be higher for the same filesize. This should be set to the slowest setting you can tolerate, often
-tune <parameter>more can be seen about these in the ffmpeg wiki documentation for x264 encoding.
-level 4.1this is another
x264setting, by specifying the level, I reduce compatability of the output video, however this setting is compatible with virtually all my devices, chromecasts, raspberry pi's etc...
-movflags +faststartthis setting is only applicable with some output video encodings, but the idea here is that video/audio index in the very beginning of the file (not compatible with all applications, but offers significant benefits such as videos being able to start right away).
While this takes care of the video, I still have problems with the subtitles, while I'm sure there was a way I can deal with this with ffmpeg, I elected to use tools such as
tesseract (which is usable with ffmpeg) and
First, I identify subtitles
ognyan@wfh:~$ mkvinfo some_movies.mkv | + Track | + Track number: 3 (track ID for mkvmerge & mkvextract: 2) | + Track UID: 3 | + Track type: subtitles | + Enabled: 1 | + Default track flag: 1 | + Forced track flag: 0 | + Lacing flag: 0 | + Minimum cache: 0 | + Maximum block additional ID: 0 | + Codec ID: S_VOBSUB | + Codec decode all: 1 | + Language: eng | + Codec's private data: size 508 | + Track | + Track number: 4 (track ID for mkvmerge & mkvextract: 3) | + Track UID: 4 | + Track type: subtitles | + Enabled: 1 | + Default track flag: 0 | + Forced track flag: 0 | + Lacing flag: 0 | + Minimum cache: 0 | + Maximum block additional ID: 0 | + Codec ID: S_VOBSUB | + Codec decode all: 1 | + Language: fre | + Codec's private data: size 508 | + Track | + Track number: 5 (track ID for mkvmerge & mkvextract: 4) | + Track UID: 5 | + Track type: subtitles | + Enabled: 1 | + Default track flag: 0 | + Forced track flag: 0 | + Lacing flag: 0 | + Minimum cache: 0 | + Maximum block additional ID: 0 | + Codec ID: S_VOBSUB | + Codec decode all: 1 | + Language: spa | + Codec's private data: size 508
I can see trakcs, 3, 4, and 5 are subtitle tracks... I then extract them
ognyan@wfh:~$ mkvextract tracks \ some_movie.mkv \ 3:some_movie.eng.srt \ 4:some_movie.fre.srt \ 5:some_movie.spa.srt
Here, I had to manually install a tool that's not part of the homebrew distribution:
ognyan@wfh:~$ brew install --with-all-languages tesseract brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rb
I then create my
ognyan@wfh:~$ vobsub2srt some_movie.eng vobsub2srt some_movie.fre vobsub2srt some_movie.spa
ass is considered the most versatile/compatible format, so I use ffmpeg to convert the subtitles from
ognyan@wfh:~$ ffmpeg -i some_movie.eng.srt some_movie.eng.ass
I then add the subtitle files back into the video:
ognyan@Wfh:~$ ffmpeg -i some_movie.mkv -i some_movie.eng.ass \ -codec copy \ -map 0 \ -map 1 \ -metadata:s:s:0 language=eng \ output.mkv
bonus hardware support! (cuda)
ffmpeg has cuda support for video transcoding and some filters... I won't go into specific use-cases but you can genereally transcode from most video formats, to either
h265 ... in addition the cuda support does support some video filters such as scaling.
the hardware supported ffmpeg offers significant performance improvements (testing some videos I was transcoding at almost 200x playback speed vs. 2-5x).