Skip to content

Instantly share code, notes, and snippets.

@nathangathright
Last active April 12, 2023 17:39
Show Gist options
  • Save nathangathright/206d83dc5b73a0b91282e033af8cd71f to your computer and use it in GitHub Desktop.
Save nathangathright/206d83dc5b73a0b91282e033af8cd71f to your computer and use it in GitHub Desktop.
Convert Deepgram JSON to PodcastNamespace JSON
# convert deepgram json to PodcastNamespace json
# Usage: python deepgram.py <input.json>
import json
import sys
# load json
with open(sys.argv[1]) as f:
data = json.load(f)
# get utterances
utterances = data['results']['utterances']
# create a transcript object
transcript = {
'version': '1.0.0',
'segments': []
}
# for each utterance, get the words array
for utterance in utterances:
words = utterance['words']
# for each word, get the start, end, speaker, and punctuated_word
for word in words:
# create a segment object
segment = {
# 'speaker': word['speaker'],
'startTime': word['start'],
'endTime': word['end'],
'body': word['punctuated_word']
}
# add the segment to the segments array
transcript['segments'].append(segment)
# save the transcript object to a json file
with open(sys.argv[1].split('.')[0] + '-namespace.json', 'w') as f:
json.dump(transcript, f, indent=2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment