- Use the OpenAI Whisper API for transcription
- It supports word level timestamps
- if there are filler words then they are not transcribed, using prompts as well didn't help. But noticed that there is a gap in the timestamps which can be used for detection.
{'word': 'things', 'start': 25.799999237060547, 'end': 26.040000915527344}, {'word': 'So', 'start': 27.239999771118164, 'end': 27.799999237060547}, {'word': 'it', 'start': 27.799999237060547, 'end': 28.0}
- In the above transcription, there is a gap of ~1 sec between things and So.
- The
select
filter inffmpeg
can be used to remove sections of video like filler words. Relevant SO answer. - auto-editor video editing tool command line based.
- ffmpeg-python - video editing in memory
- Eddie smart video editor
Created
February 16, 2024 12:49
-
-
Save mkmohangb/7f49f690b56994738e388c63c84174c0 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment