This project was part of Google Summer of Code 2023. The aim of the project was to implement the Audio Overlay Filter (aoverlay
) as part of FFmpeg's libavfilter
library, along with the necessary documentation.
The filter provides the functionality of replacing specified sections of an audio stream with another input audio stream. The possible use cases of the filter include censoring parts of an audio stream, adding a voiceover to correct/replace dialogues, or dubbing in another language.
After the GSoC period ended, FATE (FFmpeg Automated Testing Environment) regression tests were added for the filter.
The filter has two modes of operation:
-
The user specifies the time interval in which the second input stream is supposed to be present in place of the first input stream in the output.
ffmpeg -i first.wav -i second.wav -filter_complex "aoverlay=enable='between(t,10,20)'" output.wav
The output in this case will have the second stream from
t=10
seconds tot=20
seconds. -
In case no time interval is specified by the user, the filter checks for any PTS (Presentation TimeStamp) gaps in the first input stream and inserts the second input stream in between those gaps such that the output stream's PTS values are monotonous.
ffmpeg -i first.wav -i second.wav -filter_complex "[0]aselect='not(between(t,4,8))'[temp];[temp][1]aoverlay[out]" -map "[out]" output.wav
The
aselect
filter in this case rejects all the audio samples fromt=10
seconds tot=20
seconds in the first stream, creating a PTS gap. Theaoverlay
filter then detects the gap, and inserts the second input stream fromt=10
seconds tot=20
seconds in the output.
Linear cross fading is performed between the two streams at points of transition from one stream to another.
The duration for the cross fading can be specified by the user as the option cf_duration
.
ffmpeg -i first.wav -i second.wav -filter_complex "aoverlay=cf_duration=2:enable='between(t,10,20)'" output.wav
Cross fading works differently in the two modes:
-
Timeline Mode - The cross fade starts
cf_duration/2
seconds before the transition and endscf_duration/2
seconds after the transition, so that both the input streams are at50%
volume at the transition point. -
Default Mode - The cross fade starts
cf_duration
seconds before the transition when shifting from the first stream to the second stream, such that the first stream's volume becomes0%
at the transition point. When shifting from the second stream back to the first stream at the end of the PTS gap, the cross fade starts at the transition and lasts forcf_duration
seconds such that the first stream's volume starts at0%
at the transition and becomes100%
at the end ofcf_duration
seconds.
FATE tests for the four functions (timeline mode, defualt mode, and manual crossfading on both of these modes) have been added to ensure correct working in the future.
The patchset for the filter and the FATE tests is currently undergoing review at FFmpeg Patchwork
Commit for the code and the documentation
- Support for more types of cross fading (logarithmic, cubic, etc.).