Created
October 14, 2022 06:41
-
-
Save jamescalam/4e6e978b5dcf7c4277d46f5f4a74798f to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data = [] | |
for i, path in enumerate(tqdm(paths)): | |
_id = path.split('/')[-1][:-4] | |
# transcribe to get speech-to-text data | |
result = model.transcribe(path) | |
segments = result['segments'] | |
# get the video metadata... | |
video_meta = videos_dict[_id] | |
for segment in segments: | |
# merge segments data and videos_meta data | |
meta = { | |
**video_meta, | |
**{ | |
"id": f"{_id}-t{segments[j]['start']}", | |
"text": segment["text"].strip(), | |
"start": segment['start'], | |
"end": segment['end'] | |
} | |
} | |
data.append(meta) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It seems that the j iterator is never declared before line 15 and is not needed in fact.
segments[j]
It could come from a previous enumeration on the segments list.