Skip to content

Instantly share code, notes, and snippets.

@mkmohangb
Created February 16, 2024 12:49
Show Gist options
  • Save mkmohangb/7f49f690b56994738e388c63c84174c0 to your computer and use it in GitHub Desktop.
Save mkmohangb/7f49f690b56994738e388c63c84174c0 to your computer and use it in GitHub Desktop.
  • Use the OpenAI Whisper API for transcription
  • It supports word level timestamps
  • if there are filler words then they are not transcribed, using prompts as well didn't help. But noticed that there is a gap in the timestamps which can be used for detection.
    {'word': 'things', 'start': 25.799999237060547, 'end': 26.040000915527344}, 
    {'word': 'So', 'start': 27.239999771118164, 'end': 27.799999237060547}, 
    {'word': 'it', 'start': 27.799999237060547, 'end': 28.0}
    
    • In the above transcription, there is a gap of ~1 sec between things and So.
  • The select filter in ffmpeg can be used to remove sections of video like filler words. Relevant SO answer.
  • auto-editor video editing tool command line based.
  • ffmpeg-python - video editing in memory
  • Eddie smart video editor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment