Skip to content

Instantly share code, notes, and snippets.

@aleksandr-smechov
Created November 8, 2023 17:44
Show Gist options
  • Save aleksandr-smechov/9471781b9f9a9c931caaeed6b7c0d9b8 to your computer and use it in GitHub Desktop.
Save aleksandr-smechov/9471781b9f9a9c931caaeed6b7c0d9b8 to your computer and use it in GitHub Desktop.
Estimate None timestamp for Whisper output
def estimate_none_timestamps(timestamp_list):
"""
Estimates missing timestamps in a list of timestamp segments based on the character length of segment times.
Parameters:
timestamp_list (list): A list of timestamp segments with text.
Returns:
list: The list with estimated missing timestamps.
"""
total_duration = 0
total_characters = 0
for segment in timestamp_list:
start, end = segment["timestamp"]
if start is not None and end is not None:
duration = end - start
characters = len(segment["text"])
total_duration += duration
total_characters += characters
if total_characters > 0:
avg_duration_per_char = total_duration / total_characters
else:
avg_duration_per_char = 0.1 # Default duration per character (assumed)
for i, segment in enumerate(timestamp_list):
start, end = segment["timestamp"]
characters = len(segment["text"])
estimated_duration = characters * avg_duration_per_char
if start is None:
start = timestamp_list[i - 1]["timestamp"][1] if i > 0 and timestamp_list[i - 1]["timestamp"][1] is not None else 0
segment["timestamp"] = (start, start + estimated_duration)
if end is None:
segment["timestamp"] = (start, start + estimated_duration)
return timestamp_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment