Skip to content

Instantly share code, notes, and snippets.

@cbare
Created July 11, 2018 00:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cbare/bc3604f79fc250a7543a3b5e0991f863 to your computer and use it in GitHub Desktop.
Save cbare/bc3604f79fc250a7543a3b5e0991f863 to your computer and use it in GitHub Desktop.
Reformat the output of Amazon Transcribe so it's readable.
"""
Reformat the output of Amazon Transcribe
usage: python3 speakers.py asrOutput.json
"""
import json
import sys
# read in the JSON output of Amazon Transcribe
with open(sys.argv[1]) as f:
t = json.load(f)
i = 0
# items seem to be either individual words or puctuation
items = t['results']['items']
# segments are series of items spoken by a speaker, in other words, somebody
# said something.
segments = t['results']['speaker_labels']['segments']
# step through all the segments joining up the items that make up the segment
for segment in segments:
print('\n', segment['speaker_label'])
contents = []
for seg_item in segment['items']:
assert seg_item['speaker_label'] == segment['speaker_label']
item = items[i]
if item['type'] == 'pronunciation':
contents.append(' ')
assert item['start_time'] == seg_item['start_time'], 'item=' + str(item) + '\n' + 'seg_item=' + str(seg_item)
assert item['end_time'] == seg_item['end_time'], 'item=' + str(item) + '\n' + 'seg_item=' + str(seg_item)
contents.append(item['alternatives'][0]['content'])
i += 1
item = items[i]
if item['type'] == 'punctuation':
contents.append(item['alternatives'][0]['content'])
i += 1
print(''.join(contents))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment