Skip to content

Instantly share code, notes, and snippets.

@curiousjp
Last active June 16, 2023 15:39
Show Gist options
  • Save curiousjp/22668f6893e05f6a86b31b4e791a2665 to your computer and use it in GitHub Desktop.
Save curiousjp/22668f6893e05f6a86b31b4e791a2665 to your computer and use it in GitHub Desktop.
ffmpeg meme cookbook

curiousjp's ffmpeg meme cookbook

This is a place to gather "recipes" for various types of common video memes, now that discord has made this a ubiquitous format. I put these files together using ffmpeg, running it from the command line in a Windows Linux Subsystem instance. Image editing is normally done in GIMP where necessary, and youtube-dl is used to fetch video and audio files for input.

Eventually, I hope to have a basic tutorial here on how ffmpeg's worldview around video editing operates - the idea of the filter graph. Until then, I point you to the overall filtering guide, and the filter documentation reference. The basic idea however is that our video and audio streams are going to flow through a chain of filters and modifiers from our inputs to our final results.

Examples

Examples on this page will often have been resized further from the given recipe to save space.

Holding a frame still

Pausing at a certain timestamp, for a certain period of time - this can be used for jumpscare effects. Our steps will be:

  1. Split our input into a "front" and "back" part. We will do this using two instances of the trim filter.
  2. Pad the "front" part with extra copies of its last frame, using tpad.
  3. Join the "front" and "back" parts back together using concat.

Here is an example. The input file is sticks-yuki.gif and the output file is pause-yuki.gif. We will hold on the 14th frame, for ten frames:

ffmpeg -i sticks-yuki.gif -filter_complex " \
    [0:v]trim=end_frame=15,tpad=stop=10:stop_mode=clone[front]; \
    [0:v]trim=start_frame=15[back]; \
    [front][back]concat=n=2:v=1" pause-yuki.gif

Here, trim and tpad received their arguments as frame numbers, but they could also be given using time durations. A note about this is that, in the case of a gif, it does not reoptimise the file to e.g. just provide a long disposal time for the held frame - it creates a bunch of frame copies that in most cases won't do anything.

Stutter and replay

I take the ending of my input file, and repeat it three times - the audio is replaced with an SFX sample. The three plays are cut short, like when someone skips backwards to see a replay, or listening to a record being scratched. Finally, the clip and the SFX audio are allowed to play out fully, but in slow motion.

This has quite a few stages, and one approach might be to use intermediate files. I have tried to do it here in one step.

  1. First, we'll split out the ending section using (trim), and make two copies of it (split).
  2. We take one copy, and truncate it down to 0.5 seconds long (trim).
  3. We take the other copy, and slow it down to play at half speed (setpts).
  4. We take the audio file, and make two copies of it (asplit).
  5. We take one copy, and truncate it down to 0.5 seconds long (atrim).
  6. We take the other copy, and slow it down to play at half speed (atempo).
  7. We take our 0.5 second long video and audio streams and duplicate them so we have two additional copies of each (split and asplit).
  8. Finally, we combine all of these together with concat.
ffmpeg -i limitflip.mp4 -i pipe.m4a -filter_complex " \
    [0:v]trim=start=11.5,setpts=PTS-STARTPTS,split=2[ending1][ending2]; \
    [ending1]trim=start=0:end=0.5, split=3[sec1][sec2][sec3]; \
    [ending2]setpts=2*PTS[sec4]; \
    [1:a]atrim=start=0:end=0.5, asplit=3[sec1a][sec2a][sec3a]; \
    [1:a]atempo=0.5[sec4a]; \
    [0:v][0:a][sec1][sec1a][sec2][sec2a][sec3][sec3a][sec4][sec4a]concat=5:v=1:a=1[vfin][afin]; \
    [vfin]scale=w=iw/2:h=-1[vs]" \
-map "[vs]" -map "[afin]" -y limit-pipe.mp4

Something to look out for here is the use of setpts on the first line - if this isn't done, the internal timestamps on the trimmed video won't be reset - so attempting to then trim from 0 to 0.5 on the second line will result in you seeking into the blank area taken away from the first trim. This isn't necessary for the second trim, as it starts from zero.

Looped animation + transposition + chromakey with PNG transparency

I have meme PNG with a transparent background - I want to isolate an animated character from another video, and have two copies of it animated in the transparency area. I want the final result to be a looping animated GIF.

The looping animated characters can be found on frames 29 to 99 of the original video after skipping forward 3 minutes, 25 seconds. The transparent image on the PNG is towards the bottom, so the isolated frames are padded and translated into the correct positions. The padding should be white - but as they will be overlaid with one another, one character receives a transparent pad (black@0). Converting real colour MP4 video to GIF can be a bit fraught - the last steps of this graph take the final video stream, analyse it to develop an optimal palette, and then apply that palette.

There are a lot of magic numbers in this example - seek times, frame numbers, cropping windows, etc - these were all determined using trial and error.

ffmpeg -ss 03:25 -i ymo_taiso.mp4 -i 461.png -filter_complex " \
		[0:v]trim=start_frame=29:end_frame=100,scale=640:-1,crop=240:in_h-50:200:50,split[ch1][ch2]; \
		[ch1]pad=width=640:height=890:x=0:y=890-ih:color=white[ch1padded]; \
		[ch2]pad=width=640:height=890:x=640-iw:y=890-ih:color=black@0[ch2padded]; \
		[ch1padded][ch2padded]overlay[chbothpadded]; \
		[chbothpadded][1]overlay,scale=320:-1,split[final1][final2]; \
		[final1]palettegen[palette]; \
		[final2][palette]paletteuse" \
-y taiso.gif

Labelling

Adding a caption - useful for both traditional impact font memes, and subtitling. In this case I have added a two line caption, and sped the footage up using setpts. The heavy lifting here is done by drawtext, that has dozens of options - check that out if you're trying to do something specific.

ffmpeg -i angry.mp4 -filter_complex "pad=height=iw+74:y=74:color=white, \
    drawtext=fontfile=/mnt/c/Windows/Fonts/impact.ttf:fontsize=32:text_shaping=1:x=(main_w-text_w)/2:y=5:text='Warlock mains when', \
    drawtext=fontfile=/mnt/c/Windows/Fonts/impact.ttf:fontsize=32:text_shaping=1:x=(main_w-text_w)/2:y=10+line_h:text='jumping is required', \
    setpts=0.6*PTS[v]" -map "[v]" -y warlock.mp4

You can see here that we perform a pad to give us some whitespace at the top of the frame - roughly twice the font size plus a bit more. We then draw both lines of text separately - this is partly because there is no convenient way to embed newlines, but also that there is no built in way to align multiple rows to the center of the video. You can see that we can calculate the right horizontal position to draw with using the inbuilt variables main_w and text_w, and also calculate the proper y position for the second line using line_h.

Due to the use of setpts, I have dropped the audio, by using map to only specify a video stream. You can dub your own music in, turn it into a GIF, whatever. Another way to get this effect - particularly if you want other decorations or graphics, is just to create an appropriately sized graphic and use overlay.

Traditional greenscreen chromakey, audio mix, muffling

The greenscreen template for this switches from shot to reverse shot at around 6.2 seconds (out of slightly more than 8.5 seconds). Here, I want to have the background music from the second video playing faintly in the background, and then bring it up to (painfully) loud when the shot reverses and the greenscreen allows us to see it. I fast forward two minutes into the second video so the music is already well established.

ffmpeg -i joe-green.mp4 -ss 2:00 -i auto.mkv -filter_complex " \
    [0:v]chromakey=color=0x00ff01:similarity=0.15[ckout]; \
    [1:v]pad=width=1920:x=(1920-iw)/2:color=white[pad]; \
    [pad][ckout]overlay,trim=end=8.5[vout]; \
    [0:a]atrim=end=8.5[at]; \
    [1:a]atrim=end=6.2,apad=pad_dur=2.3,acompressor,volume=volume=0.5[at1]; \
    [1:a]atrim=start=6.2:end=8.5,adelay=6200|6200[at2]; \
    [at][at1][at2]amix=inputs=3:duration=first:weights='1.0 0.2 1'[aout]" \
-map "[vout]" -map "[aout]" -y test.mp4

The chromakey command is fairly straightforward to read, as long as you know what hex colour values are and how to use a colour picker on your source. Similarly, we are using a fairly simple use of pad to make sure the two sources are similarly sized. The more complex part is the audio.

I have decided to trim everything to 8.5 seconds to avoid a glitch on the last few frames of the greenscreen. I use atrim to bring the first file's audio to this length, and then start on the second - the first one is subjected to atrim to shorten it to 6.2 seconds, and then has a 2.3 second silent apad added. acompressor and volume are used to make it sound muffled and distant. A second copy of the same audio is sent through atrim, this time only keeping the part from 6.2 to 8.5 seconds. Silence needs to be added to the start of this - this is done with adelay - note that a millisecond delay has to be given for both the left and the right audio channel here.

Finally, amix combines all three audio streams using a weights table, and that stream is named aout and then used by map.

What / Sanctuary Guardian

Here, we will let the video play until a certain timestamp, and then pause it. We draw a black border around the paused image, add a subtitle of "what", and change the audio to use the "Sanctuary Guardian" theme from Earthbound.

ffmpeg -i ymo_taiso.mp4 -i sanctuary_guardian.opus -filter_complex " \
    [0:v]trim=start=49:end=57.75,setpts=PTS-STARTPTS,split[sv][svf]; \
    [0:a]atrim=start=49:end=57.75,asetpts=PTS-STARTPTS[sa]; \
    [sv]tpad=stop=8.97:stop_mode=clone,trim=start=57.75,setpts=PTS-STARTPTS[efreezev]; \
    [efreezev]drawbox=h=80:c=black:t=fill,drawbox=h=80:y=ih-80:c=black:t=fill, \
    drawbox=w=80:h=ih:c=black:t=fill,drawbox=w=80:h=ih:c=black:t=fill:x=iw-80[eboxv]; \
    [eboxv]drawtext=fontsize=64:text='what':fontcolor=white:y=main_h-line_h-15:x=(main_w-text_w)/2[etextv]; \
    [svf][sa][etextv][1:a]concat=2:v=1:a=1" -y what.mp4 

This is combining elements from previous examples, including a frame hold, trimming, and adding text. New here is drawbox, used to make the black frame around the edges.

Red box zoom / highlight / "perpetrator"

Here, we will zoom in on a particular area of the screen and compose it onto the main video with a red border using overlay. We split off a copy of the original video, crop it, and scale it up. We add a red border with pad, and then overlay it onto the original at specified coordinates.

In the example, I decided that I wanted to add a "What" style held frame at the end - so I need to split an additional copy, use trim to reduce it down to a single frame, tpad to stretch it out (oddly, the length of the stretch doesn't seem to matter that much, as this is governed by the audio), scale it up, drawbox to make our frame, and add the text. Finally, this is concat onto the original:

ffmpeg -ss 9:40 -to 9:48 -i "Destiny 2 2023-04-25 21-50-13.mp4" -i sanctuary_guardian.opus -filter_complex " ;
    split=3[vo][vz][vf]; \
    [vz]crop=w=160:h=160:x=iw/2-220:y=ih/2-105,scale=w=iw*3:h=-1,pad=h=ih+4:w=iw+4:x=-1:y=-1:color=red[vzz]; \
    [vo][vzz]overlay=x=main_w-overlay_w-50:y=50[combined]; \
    [vf]trim=start_frame=206:end_frame=207,crop=w=384:h=216:x=iw/2-340:y=ih/2-105,setpts=PTS-STARTPTS,tpad=start_mode=clone:start_duration=0:30,scale=w=1920:h=-1,drawbox=h=80:c=black:t=fill,drawbox=h=80:y=ih-80:c=black:t=fill,drawbox=w=80:h=ih:c=black:t=fill,drawbox=w=80:h=ih:c=black:t=fill:x=iw-80,drawtext=fontsize=64:text='what':fontcolor=white:y=main_h-line_h-15:x=(main_w-text_w)/2[vff]; \
    [combined][0:a][vff][1:a]concat=n=2:v=1:a=1" \
-y what-snake.mp4

Something to watch out for here when using split is that you must use the outputs of the split in the same order you create them - if the first command had read split=3[vf][vo][vz], ffmpeg would have frozen up on the last frame. (Guess how I know.) Another whammy is around tpad - padding from the start of the single frame works well - but padding from the end results in several frames of black in the padded clip before the correct result kicks in.

breaking a framesheet into a gif

convert ../00298-1819686959-3x3framespng -crop 512x512 +adjoin frame%03d.png
ffmpeg -f image2 -framerate 4 -i frame%03d.png frames.mp4

ripping audio (here from opus to alac)

ffmpeg -i source.webm -vn -acodec alac "bob_dylan_-_we_better_talk_this_over_-_senor.m4a"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment