riotbib/openai-whisper-silence-zdf.md

## openai-whisper-silence-zdf.md

      
    Raw
  

              openai-whisper-silence-zdf.md
            
          
    OpenAI's whisper was most likely trained on subtitled videos by German public-service television broadcaster ZDF.
Whisper "is a general-purpose speech recognition model […] trained on a large dataset of diverse audio", as it's written in the project's README.
A clear indication is the (so to say) transcription of silent audio to text saying "Untertitel im Auftrag des ZDF, 2017".
This sentence may be seen in videos of ZDF's youth program Funk. One example may be a 2017 video of Funk's format musstewissen Mathe at the end of the video.
Thus, Whisper translates silence into copyright notices.
Steps to reproduce

# generate 23 seconds of silence
$ ffmpeg -f lavfi -i anullsrc=r=11025:cl=mono -t 23 -acodec aac silence.m4a

# transcribe it to text using Whisper
$ whisper silence.m4a --language German
[00:00.000 --> 00:02.000]  Untertitel im Auftrag des ZDF, 2017