Skip to content

Instantly share code, notes, and snippets.

@igracia
Forked from ktoraskartwilio/mixing_recordings.md
Last active September 21, 2023 19:43
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save igracia/581a8dfb8b88aa624c1d456adc6affb2 to your computer and use it in GitHub Desktop.
Save igracia/581a8dfb8b88aa624c1d456adc6affb2 to your computer and use it in GitHub Desktop.
Mixing Recordings

Working with Twilio Room Recordings

The following guide will show you how to mix several audio and video tracks together, forming a grid. For this example, we will use two video and two audio tracks. The video tracks will be placed side by side in a 1024x768 output file.


UPDATE - Video Recording Compositions API is out!

Yes! No ned to go through this process alone anymore. We've recently released the Twilio Recording Composition API. This API will allow you to compose and transcode you Room Recordings. You can find the reference docs here

When mixing the tracks, we need to consider that they might (and probably have) started at different times. If we were to merge tracks without taking this into account, we would end up with synchronization issues. In our example, since Bob got in the room a good 20s (and that’s really a huge time for synchronization of audios), mixing both Alice’s and Bob’s audio tracks together would end up having one speaking over the other.

To make merging easier, the start time of all tracks from the same room is the creation of the room itself. Let’s get the start times for all the tracks from this room

  • Get Alice's audio start_time

    $ ffprobe -show_entries format=start_time alice.mka
    Input #0, matroska,webm, from 'alice.mka':
      Metadata:
        encoder         : GStreamer matroskamux version 1.8.1.1
        creation_time   : 2017-06-30T09:03:44.000000Z
      Duration: 00:13:09.36, start: 1.564000, bitrate: 48 kb/s
        Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
        Metadata:
          title           : Audio
    start_time=1.564000
    
  • Get Alice's video start_time

    $ ffprobe -show_entries format=start_time alice.mkv
    Input #0, matroska,webm, from 'alice.mkv':
      Metadata:
        encoder         : GStreamer matroskamux version 1.8.1.1
        creation_time   : 2017-06-30T09:03:44.000000Z
      Duration: 00:13:09.33, start: 1.584000, bitrate: 857 kb/s
        Stream #0:0(eng): Video: vp8, yuv420p(progressive), 640x480, SAR 1:1 DAR 4:3, 1k tbr, 1k tbn, 1k tbc (default)
        Metadata:
          title           : Video
    start_time=1.584000
    
  • Get Bob's audio start_time

    $ ffprobe -show_entries format=start_time bob.mka
    Input #0, matroska,webm, from 'bob.mka':
      Metadata:
        encoder         : GStreamer matroskamux version 1.8.1.1
        creation_time   : 2017-06-30T09:04:03.000000Z
      Duration: 00:12:49.46, start: 20.789000, bitrate: 50 kb/s
        Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
        Metadata:
          title           : Audio
    start_time=20.789000
    
  • Get Bob's video start_time

    $ ffprobe -show_entries format=start_time bob.mkv
    ffprobe version 3.3.2 Copyright (c) 2007-2017 the FFmpeg developers
      built with Apple LLVM version 8.0.0 (clang-800.0.42.1)
    Input #0, matroska,webm, from 'bob.mkv':
      Metadata:
        encoder         : GStreamer matroskamux version 1.8.1.1
        creation_time   : 2017-06-30T09:04:03.000000Z
      Duration: 00:12:49.42, start: 20.814000, bitrate: 1645 kb/s
        Stream #0:0(eng): Video: vp8, yuv420p(progressive), 640x480, SAR 1:1 DAR 4:3, 1k tbr, 1k tbn, 1k tbc (default)
        Metadata:
          title           : Video
    start_time=20.814000
    
Track start_time (in ms) creation_time
alice.mka 1564 2017-06-30T09:03:44.000000Z
alice.mkv 1584 2017-06-30T09:03:44.000000Z
bob.mka 20789 2017-06-30T09:04:03.000000Z
bob.mkv 20814 2017-06-30T09:04:03.000000Z

We can see that the start_time from different media types (audio and video) is not the same for the same participant, as media arrives with a slight offset after the webrtc negotiation. File creation_time offsets translate into an equivalent start-time offset. The ~20s that Bob had Alice waiting can be obtained directly from the start_time. If the start_time of a track would have been relative to the creation_time, we would have had to first get the offset in the creation_time and then calculated the final offset. Since creation_time does not have ms precision, this could lead to synchronization issues.

When merging the different tracks, we’ll need to use as time reference the one that has the lowest start_time. This is important to keep all tracks in sync. What we will do is

  • Take the lowest start_stime value of all tracks. In our case that’s alice.mka with a start_time of 1564ms.
  • Use that track as reference, by not indicating any offset when mixing tracks. We will reflect that in our track list by offsetting it 0.
  • Calculate the offset for tracks, by subtracting the reference start_time value from all the others. The following table shows the current values for our tracks, in which alice.mka is the reference value with 1564ms
Track Track# Current value Reference value Offset in ms
alice.mka 0 1564 1564 0
alice.mkv 1 1584 1564 20
bob.mka 2 20789 1564 19225
bob.mkv 3 20814 1564 19250

Anatomy of an ffmpeg command

Before we start mixing the files, we’re going to have a quick overview of the ffmpeg program command line arguments that we are going to use. All commands will have the following structure ffmpeg [[infile options] -i infile] {filter options} {[outfile options] outfile}

  • ffmpeg: program name
  • [[infile options] -i infile]: This tells the program what input files to use. You’ll need to add -i for each file that you want to add to the conversion process. [infile options] will only be used for video tracks, to indicate the offset with the -itsoffset flag. The position of a file in the inputs list is important, as we’ll later use this position to reference the track. ffmpeg treats this input list as a zero-based array. References to input tracks have the form [#input], where #input is the position the track has in the input array. For instance, in the list ffmpeg -i alice.mkv -i bob.mkv -i alice.mka -i bob.mka we would use these references
    • alice.mkv[0] the first input
    • bob.mkv[1] the second input
    • alice.mka[2] the third input
    • bob.mka[3] the fourth input
  • {filter options}: this is where we define what to do with the input tracks, whether it is mixing audio, video or both. A media stream (audio or video) can be passed through a set of steps, each step modifying the stream returned by the previous step. For instance, in the case of the video inputs, we're going to scale->pad->generate black frames for synchronization->concatenate. The output of this "pipeline" will is named as [r#c#], with r and c standing for row and column. Since we only have two videos, we could just do with [r#], but then it will be easier for you to extrapolate to other scenarios.
  • {[outfile options] outfile}: defined where to store the output of the program. This is where you specify webm vs mp4, for instance.

Mixing tracks

We're going to mix all tracks in a single step. We'll explain the resulting command as clearly as possible, but be advised that this is not for the faint of heart. The command that we are going to use

  • Keeps video and audio tracks in synchronization
  • Lets you change the output video resolution
  • Pads the video tracks to keep aspect ratio of the original videos

The complete command to obtain the mixed file in webm, with a 1024x768 resolution is

ffmpeg -i alice.mkv -i bob.mkv -acodec libopus -i alice.mka -acodec libopus -i bob.mka -y \
       -filter_complex "\
        [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.020[b0],[b0][vs0]concat[r0c0];\
        [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=19.25[b1],[b1][vs1]concat[r0c1];\
        [r0c0][r0c1]hstack=inputs=2[video];\
        [2]aresample=async=1[a0];\
        [3]aresample=async=1,adelay=19225.0|19225.0[a1];\
        [a0][a1]amix=inputs=2[audio]" \
       -map [video] \
       -map [audio] \
       -acodec libopus \
       -vcodec libvpx \
        output.webm

Let's dissect this command

  • -i alice.mkv -i bob.mkv -i alice.mka -i bob.mka: These are the input files.
  • -filter_complex: we’ll be performing a filter operation on all tracks passed as input.
    • [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.020[b0],[b0][vs0]concat[r0c0]: Here we Take Alice's video and scale it to half the width of the desired resolution (512) while maintaining the original aspect ratio. We pad the scaled video and tag it [vs0. Then we generate color=black frames for the duration offset in seconds calculated for this track, which will delay the track so that it's in sync. Finally, we concat the black stream [b0] with the padded stream [vs0], and tag it as [r0c0]. we concat the black frames with the padded
    • [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=19.25[b1],[b1][vs1]concat[r0c1]: This part is the same as the previous one, but the offset used for the duration corresponds to this track.
    • [r0c0][r0c1]hstack=inputs=2[video]
    • [#]aresample=async=1,adelay={delay}|{delay}[a#];: For each track with an offset value > 0, we need to indicate the audio delay in milliseconds for that track
      • [#]: As explained in +Working with room recordings: Anatomy-of-an-ffmpeg-command, each track is referenced by its position in the inputs array. Since only bob.mka is delayed, there’s only one block.
      • aresample=async=1: resamples the audio track, filling and trimming if needed. See more info in the resampler docs.
      • adelay={delay}|{delay}: will be delaying the audio for both left and right channels an amount of {delay} seconds.
      • [a#]: this is a label that we’ll use to reference this filtered track
    • [#][a1]..[an]amix=inputs={#of-inputs}: Once we have added the appropriate delays for all audio tracks, we configure the filter that’ll perform the actual audio mixing, where n is the position of the n-th track. In our case, there are only two tracks, that’s why we only use [2][a1].
  • Output definition
    • -map [video]: Select the stream marked a video to be used in the output
    • -map [audio]: Select the stream marked a audio to be used in the output
    • -acodec libopus: The audio codec to use. For webm we'll use OPUS
    • -vcodec libvpx: The video codec to use. For webm we'll use VP8
    • output.webm: The output file name

And for an mp4 file

ffmpeg -i alice.mkv -i bob.mkv -acodec libopus -i alice.mka -acodec libopus -i bob.mka -y \
       -filter_complex "\
        [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.020[b0],[b0][vs0]concat[r0c0];\
        [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=19.25[b1],[b1][vs1]concat[r0c1];\
        [r0c0][r0c1]hstack=inputs=2[video];\
        [2]aresample=async=1[a0];\
        [3]aresample=async=1,adelay=19225.0|19225.0[a1];\
        [a0][a1]amix=inputs=2[audio]" \
       -map [video] \
       -map [audio] \
       -acodec libfdk_aac \
       -vcodec libx264 \
        output.mp4
@sunilsharmaji
Copy link

I am trying to use this command for merging but didn't get success. it throws me error as mentioned below. please help me to resolve it.
command

ffmpeg -i RT63dad03605fa55a68d6e803ae59eec51.mkv -i RTdb342ae19c78a92faee49df03bd4662b.mkv -i RTa462845e22e9ccfd6aa159a62cb562a9.mka -i RT06d9f2f4f51907b47b2d4648b4465bd3.mka -y \
       -filter_complex "\
        [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.079[b0],[b0][vs0]concat[r0c0];\
        [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=34.510[b1],[b1][vs1]concat[r0c1];\
        [r0c0][r0c1]hstack=inputs=2[video];\
        [3]adelay=34338.0|34338.0[a1];\
        [2][a1]amix=inputs=2[audio]" \
       -map [video] \
       -map [audio] \
        -acodec libopus \
       -vcodec libvpx \
        output.webm

Error

$  ffmpeg -i RT63dad03605fa55a68d6e803ae59eec51.mkv -i RTdb342ae19c78a92faee49df03bd4662b.mkv -i RTa462845e22e9ccfd6aa159a62cb562a9.mka -i RT06d9f2f4f51907b47b2d4648b4465bd3.mka -y \
>        -filter_complex "\
>         [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.079[b0],[b0][vs0]concat[r0c0];\
>         [1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=34.510[b1],[b1][vs1]concat[r0c1];\
>         [r0c0][r0c1]hstack=inputs=2[video];\
>         [3]adelay=34338.0|34338.0[a1];\
>         [2][a1]amix=inputs=2[audio]" \
>        -map [video] \
       -map [audio] \
>        -map [audio] \
>         -acodec libopus \
>        -vcodec libvpx \
>         output.webm
ffmpeg version N-89395-g71421f382f Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 7.2.0 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-cuda --enable-cuvid --enable-d3d11va --enable-nvenc --enable-dxva2 --enable-avisynth --enable-libmfx
  libavutil      56.  5.100 / 56.  5.100
  libavcodec     58.  6.102 / 58.  6.102
  libavformat    58.  2.103 / 58.  2.103
  libavdevice    58.  0.100 / 58.  0.100
  libavfilter     7.  6.100 /  7.  6.100
  libswscale      5.  0.101 /  5.  0.101
  libswresample   3.  0.101 /  3.  0.101
  libpostproc    55.  0.100 / 55.  0.100
Input #0, matroska,webm, from 'RT63dad03605fa55a68d6e803ae59eec51.mkv':
  Metadata:
    encoder         : GStreamer matroskamux version 1.8.1.1
    creation_time   : 2018-01-11T06:08:34.000000Z
  Duration: 00:08:31.93, start: 12.791000, bitrate: 1355 kb/s
    Stream #0:0(eng): Video: vp8, yuv420p(progressive), 720x1280, SAR 1:1 DAR 9:16, 1k tbr, 1k tbn, 1k tbc (default)
    Metadata:
      title           : Video
Input #1, matroska,webm, from 'RTdb342ae19c78a92faee49df03bd4662b.mkv':
  Metadata:
    encoder         : GStreamer matroskamux version 1.8.1.1
    creation_time   : 2018-01-11T06:09:09.000000Z
  Duration: 00:07:52.82, start: 47.222000, bitrate: 799 kb/s
    Stream #1:0(eng): Video: vp8, yuv420p(progressive), 720x1280, SAR 1:1 DAR 9:16, 1k tbr, 1k tbn, 1k tbc (default)
    Metadata:
      title           : Video
Input #2, matroska,webm, from 'RTa462845e22e9ccfd6aa159a62cb562a9.mka':
  Metadata:
    encoder         : GStreamer matroskamux version 1.8.1.1
    creation_time   : 2018-01-11T06:08:34.000000Z
  Duration: 00:08:32.20, start: 12.712000, bitrate: 45 kb/s
    Stream #2:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      title           : Audio
Input #3, matroska,webm, from 'RT06d9f2f4f51907b47b2d4648b4465bd3.mka':
  Metadata:
    encoder         : GStreamer matroskamux version 1.8.1.1
    creation_time   : 2018-01-11T06:09:08.000000Z
  Duration: 00:07:48.80, start: 47.050000, bitrate: 11 kb/s
    Stream #3:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      title           : Audio
Stream mapping:
  Stream #0:0 (vp8) -> scale
  Stream #1:0 (vp8) -> scale
  Stream #2:0 (opus) -> amix:input0
  Stream #3:0 (opus) -> adelay
  hstack -> Stream #0:0 (libvpx)
  amix -> Stream #0:1 (libopus)
Press [q] to stop, [?] for help
[Parsed_pad_1 @ 000001fbd518ef00] Input area 0:-72:512:838 not within the padded area 0:0:512:768 or zero-sized
[Parsed_pad_1 @ 000001fbd518ef00] Failed to configure input pad on Parsed_pad_1
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #2:0
Conversion failed!

@dipernaa
Copy link

dipernaa commented Jan 11, 2018

adelay={delay}|{delay}: will be delaying the audio for both left and right channels an amount of {delay} seconds.
Is this statement incorrect or is the [3]adelay=19225.0|19225.0[a1];\ statement incorrect? The first says seconds but the actual command uses milliseconds.

@sunilsharmaji
Copy link

sunilsharmaji commented Feb 14, 2018

@dipernaa
No my statement is not wrong. I calculated in same way as you calculated. [3]adelay=19225.0|19225.0[a1] this is in milisecond as also you use in your example. I use duration=19.25[b1] in seconds.

My parameters are like [3]adelay=34338.0|34338.0[a1]; and duration=34.510[b1] in seconds.

@igracia
Copy link
Author

igracia commented Mar 7, 2018

@sunilsharmaji The error you are getting is because the video dimensions are off. You're hitting this issue http://www.ffmpeg-archive.org/Input-area-not-within-the-padded-area-or-zero-sized-td2403634.html

@filipecrosk
Copy link

@igracia thanks for sharing it. Just a note, with your last edition you forgot a semicolon and a slash at the end of [2]aresample=async=1[a0], it should be: [2]aresample=async=1[a0];\

@vpop
Copy link

vpop commented Apr 17, 2018

This is an awesome guide, so thank you both @ktoraskartwilio for creating the original and @igracia for enhancing it.

As you guys might know, if you refresh your browser during a Twilio webRTC recorded session you will end up with multiple video and audio segments (recordings) per participant. For example, if Alice was to refresh her browser once, you could end up with the following table:

Track start_time (in ms) creation_time duration (in ms)
alice1.mka 1564 2017-06-30T09:03:44.000000Z 1020
alice1.mkv 1584 2017-06-30T09:03:44.000000Z 1000
alice2.mka 4584 2017-06-30T09:03:47.000000Z 60000
alice2.mkv 4584 2017-06-30T09:03:47.000000Z 60000
bob.mka 20789 2017-06-30T09:04:03.000000Z 120000
bob.mkv 20814 2017-06-30T09:04:03.000000Z 120000

How would you change your ffmpeg command to accommodate for two (or more) video and audio segments of a participant, with black frames during the downtime (browser refresh)? Meaning, on Alice's side of the final video there should be two seconds of black frames between her two video segments.

We've found this answer, but it's not using filter_complex and might require us to run a separate ffmpeg command for each participant, then a final one to bring the resulting videos together, unless they can be combined. Still investigating that part.

@filipecrosk
Copy link

@vpop I've been doing it in steps, one participant per time and then merging both final video's participant.
In my case, sometimes they had to refresh the browser and other times they add screenshare or change the camera, so you can have one audio track and multiple video tracks.
I'm not sure if we can do it in just one command, and maybe its best to keep it in steps so you can easily track errors.

@rkg199
Copy link

rkg199 commented May 15, 2018

Hello Team
I have implemented the new code. and final command that is created for merging videos is as below

ffmpeg -i /var/www/html/swipr/uploads/test/video1.mkv -i /var/www/html/swipr/uploads/test/video2.mkv -acodec libopus -i /var/www/html/swipr/uploads/test/audio1.mka -acodec libopus -i /var/www/html/swipr/uploads/test/audio2.mka -y -filter_complex " [0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=2.716[b0],[b0][vs0]concat[r0c0];[1]scale=512:-2,pa=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=0[b1],[b1][vs1]concat[r0c1];[r0c0][r0c1]hstack=inputs=2[video];[2]aresample=async=1,adelay=2743.0|2743.0[a0];[3]aresample=async=1,adelay=5.0|5.0[a1];[a0][a1]amix=inputs=2[audio]" -map[video] \ -map[audio] \ -acodec libopus \ -vcodec libvpx \ output.webm

After trying above command with different secenerios we are getting following errors.

#1)[AVFilterGraph @ 0x67fef00] No such filter: ' '
Error initializing complex filters.
Invalid argument

#2)Unrecognized option 'filter_complex[0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=2.716[b0],[b0][vs0]concat[r0c0];[1]scale=512:-2,pa=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=0[b1],[b1][vs1]concat[r0c1];[r0c0][r0c1]hstack=inputs=2[video];[2]aresample=async=1,adelay=2743.0|2743.0[a0];[3]aresample=async=1,adelay=5.0|5.0[a1];[a0][a1]amix=inputs=2[audio]-map[video]-map[audio]-acodec'.
Error splitting the argument list: Option not found

#3)Unrecognized option 'map[video]'.
Error splitting the argument list: Option not found

If we remove all the filters then we are getting a final output but that is not useful

Can anybody help me on that.That would be great appreciation.

@igracia
Copy link
Author

igracia commented Jul 5, 2018

@rkg199 I think you are missing a linebreak after the quotes that enclose the complex filter definition, or adding an empty space after the quotes. Let me format that correctly and see if that makes sense

ffmpeg \
-i /var/www/html/swipr/uploads/test/video1.mkv \
-i /var/www/html/swipr/uploads/test/video2.mkv \
-acodec libopus -i /var/www/html/swipr/uploads/test/audio1.mka \
-acodec libopus -i /var/www/html/swipr/uploads/test/audio2.mka \
-y -filter_complex "\
[0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=2.716[b0],[b0][vs0]concat[r0c0];\
[1]scale=512:-2,pa=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=0[b1],[b1][vs1]concat[r0c1];\
[r0c0][r0c1]hstack=inputs=2[video];\
[2]aresample=async=1,adelay=2743.0|2743.0[a0];\
[3]aresample=async=1,adelay=5.0|5.0[a1];\
[a0][a1]amix=inputs=2[audio]" \
-map[video] \
-map[audio] \
-acodec libopus \
-vcodec libvpx \
output.webm

@chandra-shekhar
Copy link

chandra-shekhar commented Jun 21, 2019

Hi there

I am trying the same method but looks like the concatenation process is too slow for combining 2 videos of 8 mins and 2 mins. It's took 1+ hours and still the video wasn't fully combined.

It starts well, but after a certain point when almost half the video is prepared, it becomes very slow and FPS value decreases extremely.
Am I missing something here?

Following is the command-

ffmpeg -i /tmp/RT5e613a1f3d58024c1b1575db2d6df481.mkv -i /tmp/RTc9a717fa1500b9eb6facffd8084e90e2.mkv -acodec libopus -i /tmp/RTc834bc574c49eab3cb8b2a37be1bfe14.mka -acodec libopus -i /tmp/RT5bbd0b3b9c095b3f9c412fb8e82eaa0f.mka -y \
-filter_complex "\
[0]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs0],color=black:size=512x768:duration=0.0030000000000001[b0],[b0][vs0]concat[r0c0];\
[1]scale=512:-2,pad=512:768:(ow-iw)/2:(oh-ih)/2[vs1],color=black:size=512x768:duration=177.032[b1],[b1][vs1]concat[r0c1];\
[r0c0][r0c1]hstack=inputs=2[video];\
[2]aresample=async=1[a0];
[3]aresample=async=1,adelay=177012|177012[a1];\
[a0][a1]amix=inputs=2[audio] " \
-map [video] \
-map [audio] \
-acodec libfdk_aac \
-vcodec libx264 \
composition_2089.mp4

@realies
Copy link

realies commented Sep 21, 2023

Mixing down all .mka files in the current folder with Node.js:

const fs = require('fs').promises;
const { promisify } = require('util');
const exec = promisify(require('child_process').exec);

(async () => {
  const audioFiles = await Promise.all(
    (await fs.readdir('.')).filter(f => f.endsWith('.mka')).map(async f => ({
      name: f,
      start: parseFloat(JSON.parse((await exec(`ffprobe -v quiet -print_format json -show_entries format=start_time ${f}`)).stdout).format.start_time) * 1000
    }))
  );

  console.log('Audio Files:', JSON.stringify(audioFiles, null, 2));

  const referenceTime = Math.min(...audioFiles.map(f => f.start));
  const filters = [], mix = [];

  audioFiles.forEach(({ start }, i) => {
    const delay = start - referenceTime;
    filters.push(`[${i}]aresample=async=1${delay > 0 ? `,adelay=${delay}|${delay}` : ''}[a${i}]`);
    mix.push(`[a${i}]`);
  });

  const ffmpegCmd = `ffmpeg ${audioFiles.map(f => `-acodec libopus -i ${f.name}`).join(' ')} -filter_complex "${[...filters, `${mix.join('')}amix=inputs=${audioFiles.length}[audio]`].join('; ')}" -map [audio] -acodec libopus output.mka`;
  
  console.log('Executing:', ffmpegCmd);

  exec(ffmpegCmd)
    .then(({ stdout, stderr }) => console.log(`stdout: ${stdout}\nstderr: ${stderr}`))
    .catch(e => console.error(`Error: ${e}`));
})();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment