Skip to content

Instantly share code, notes, and snippets.

@HarshK23
Last active March 24, 2024 15:50
Show Gist options
  • Save HarshK23/703946316d98f5f3bf4e0442f956bd7d to your computer and use it in GitHub Desktop.
Save HarshK23/703946316d98f5f3bf4e0442f956bd7d to your computer and use it in GitHub Desktop.
Google Summer of Code 2023 | FFmpeg

FFmpeg - Implementing Audio Overlay Filter | GSoC 2023

This project was part of Google Summer of Code 2023. The aim of the project was to implement the Audio Overlay Filter (aoverlay) as part of FFmpeg's libavfilter library, along with the necessary documentation.

The filter provides the functionality of replacing specified sections of an audio stream with another input audio stream. The possible use cases of the filter include censoring parts of an audio stream, adding a voiceover to correct/replace dialogues, or dubbing in another language.

After the GSoC period ended, FATE (FFmpeg Automated Testing Environment) regression tests were added for the filter.

What work was done

The filter has two modes of operation:

  • Timeline Mode

    The user specifies the time interval in which the second input stream is supposed to be present in place of the first input stream in the output.

    ffmpeg -i first.wav -i second.wav -filter_complex "aoverlay=enable='between(t,10,20)'" output.wav

    The output in this case will have the second stream from t=10 seconds to t=20 seconds.

  • Default Mode

    In case no time interval is specified by the user, the filter checks for any PTS (Presentation TimeStamp) gaps in the first input stream and inserts the second input stream in between those gaps such that the output stream's PTS values are monotonous.

    ffmpeg -i first.wav -i second.wav -filter_complex "[0]aselect='not(between(t,4,8))'[temp];[temp][1]aoverlay[out]" -map "[out]" output.wav

    The aselect filter in this case rejects all the audio samples from t=10 seconds to t=20 seconds in the first stream, creating a PTS gap. The aoverlay filter then detects the gap, and inserts the second input stream from t=10 seconds to t=20 seconds in the output.

Crossfading

Linear cross fading is performed between the two streams at points of transition from one stream to another.

The duration for the cross fading can be specified by the user as the option cf_duration.

ffmpeg -i first.wav -i second.wav -filter_complex "aoverlay=cf_duration=2:enable='between(t,10,20)'" output.wav

Cross fading works differently in the two modes:

  • Timeline Mode - The cross fade starts cf_duration/2 seconds before the transition and ends cf_duration/2 seconds after the transition, so that both the input streams are at 50% volume at the transition point.

  • Default Mode - The cross fade starts cf_duration seconds before the transition when shifting from the first stream to the second stream, such that the first stream's volume becomes 0% at the transition point. When shifting from the second stream back to the first stream at the end of the PTS gap, the cross fade starts at the transition and lasts for cf_duration seconds such that the first stream's volume starts at 0% at the transition and becomes 100% at the end of cf_duration seconds.

FATE tests for the four functions (timeline mode, defualt mode, and manual crossfading on both of these modes) have been added to ensure correct working in the future.

What code got merged

The patchset for the filter and the FATE tests is currently undergoing review at FFmpeg Patchwork

Link to the code

Commit for the code and the documentation

Commit for the FATE tests

What's left to do

  • Support for more types of cross fading (logarithmic, cubic, etc.).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment