Skip to content

Instantly share code, notes, and snippets.

@markjenkins
Created September 25, 2024 05:07
Show Gist options
  • Save markjenkins/9fabdb600ed659625bcfdbad7a6bf857 to your computer and use it in GitHub Desktop.
Save markjenkins/9fabdb600ed659625bcfdbad7a6bf857 to your computer and use it in GitHub Desktop.
A simple python script to take a AWS transcribe json file with speaker labels and print a plain text transcript
#!/usr/bin/env python3
# Take a AWS transcribe json file with speaker labels from stdin and
# print out a plain text transcript with the speaker_label items at the
# start of line with a colon
# Copyright Mark Jenkins <mark@markjenkins.ca>
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved. This file is offered as-is,
# without any warranty.
from sys import stdin
from json import load as json_load
transcript_json = json_load(stdin)
for seg in transcript_json['results']['audio_segments']:
print( seg['speaker_label'] + ":", seg['transcript'] )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment