HarshK23/GSoC 2023.md

## GSoC 2023.md

      
    Raw
  

              GSoC 2023.md
            
          
    FFmpeg - Implementing Audio Overlay Filter | GSoC 2023

This project was part of Google Summer of Code 2023. The aim of the project was to implement the Audio Overlay Filter (aoverlay) as part of FFmpeg's libavfilter library, along with the necessary documentation.
The filter provides the functionality of replacing specified sections of an audio stream with another input audio stream. The possible use cases of the filter include censoring parts of an audio stream, adding a voiceover to correct/replace dialogues, or dubbing in another language.
After the GSoC period ended, FATE (FFmpeg Automated Testing Environment) regression tests were added for the filter.
What work was done

The filter has two modes of operation:


Timeline Mode

The user specifies the time interval in which the second input stream is supposed to be present in place of the first input stream in the output.
ffmpeg -i first.wav -i second.wav -filter_complex "aoverlay=enable='between(t,10,20)'" output.wav
The output in this case will have the second stream from t=10 seconds to t=20 seconds.


Default Mode

In case no time interval is specified by the user, the filter checks for any PTS (Presentation TimeStamp) gaps in the first input stream and inserts the second input stream in between those gaps such that the output stream's PTS values are monotonous.
ffmpeg -i first.wav -i second.wav -filter_complex "[0]aselect='not(between(t,4,8))'[temp];[temp][1]aoverlay[out]" -map "[out]" output.wav
The aselect filter in this case rejects all the audio samples from t=10 seconds to t=20 seconds in the first stream, creating a PTS gap. The aoverlay filter then detects the gap, and inserts the second input stream from t=10 seconds to t=20 seconds in the output.


Crossfading

Linear cross fading is performed between the two streams at points of transition from one stream to another.
The duration for the cross fading can be specified by the user as the option cf_duration.
ffmpeg -i first.wav -i second.wav -filter_complex "aoverlay=cf_duration=2:enable='between(t,10,20)'" output.wav
Cross fading works differently in the two modes:


Timeline Mode - The cross fade starts cf_duration/2 seconds before the transition and ends cf_duration/2 seconds after the transition, so that both the input streams are at 50% volume at the transition point.


Default Mode - The cross fade starts cf_duration seconds before the transition when shifting from the first stream to the second stream, such that the first stream's volume becomes 0% at the transition point. When shifting from the second stream back to the first stream at the end of the PTS gap, the cross fade starts at the transition and lasts for cf_duration seconds such that the first stream's volume starts at 0% at the transition and becomes 100% at the end of cf_duration seconds.


FATE tests for the four functions (timeline mode, defualt mode, and manual crossfading on both of these modes) have been added to ensure correct working in the future.
What code got merged

The patchset for the filter and the FATE tests is currently undergoing review at FFmpeg Patchwork
Link to the code

Commit for the code and the documentation
Commit for the FATE tests
What's left to do


Support for more types of cross fading (logarithmic, cubic, etc.).