Skip to content

Instantly share code, notes, and snippets.

@riotbib
Created June 15, 2023 19:07
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save riotbib/3b3c5f817b55b68801d14b8bdb02df09 to your computer and use it in GitHub Desktop.
Save riotbib/3b3c5f817b55b68801d14b8bdb02df09 to your computer and use it in GitHub Desktop.
"Untertitel im Auftrag des ZDF, 2017"

OpenAI's whisper was most likely trained on subtitled videos by German public-service television broadcaster ZDF.

Whisper "is a general-purpose speech recognition model […] trained on a large dataset of diverse audio", as it's written in the project's README.

A clear indication is the (so to say) transcription of silent audio to text saying "Untertitel im Auftrag des ZDF, 2017".

This sentence may be seen in videos of ZDF's youth program Funk. One example may be a 2017 video of Funk's format musstewissen Mathe at the end of the video.

Thus, Whisper translates silence into copyright notices.

Steps to reproduce

# generate 23 seconds of silence
$ ffmpeg -f lavfi -i anullsrc=r=11025:cl=mono -t 23 -acodec aac silence.m4a

# transcribe it to text using Whisper
$ whisper silence.m4a --language German
[00:00.000 --> 00:02.000]  Untertitel im Auftrag des ZDF, 2017
@adiehl96
Copy link

I just noticed this as well, pretty annoying. User doublex created a dictionary of all known artefacts of this kind, sorted by language, to ease automatic removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment