Skip to content

Instantly share code, notes, and snippets.

@RoxanaTapia
RoxanaTapia / output_establish_differences.json
Created November 24, 2020 21:51
Sample output Establish Differences.
{
"pairwiseDiffs":[
{
"fileA":"among-us-baseline.txt",
"fileB":"among-us-correct.mp3",
"granularity":"word",
"differences":[
{
"position":0,
"segmentFromA":"I",
@RoxanaTapia
RoxanaTapia / pairwise_diffs.json
Created November 24, 2020 17:29
Sample generated pairwiseDiffs Establish Differences
[
{
"fileA":"among-us-baseline.txt",
"fileB":"among-us-correct.mp3",
"granularity":"word",
"differences":[
{
"position":0,
"segmentFromA":"I",
"segmentFromB":"I",
@RoxanaTapia
RoxanaTapia / jobs.json
Created November 24, 2020 17:09
Sample jobs file Establish Differences
{
"name":"among-us-speech",
"sources":[
{
"filename":"among-us-baseline.txt",
"segments":[
"I",
"was",
"playing",
"Among",
@RoxanaTapia
RoxanaTapia / extract_transcripts_output.json
Last active November 24, 2020 15:21
Sample output of Extract Transcripts
[
{
"filename":"among-us-baseline.txt",
"location":"./source/media/among-us-baseline.txt",
"media_type":"txt",
"extractor":"read",
"transcript":"I was playing Among Us and I wasn’t the imposter. Everybody voted that I was the imposter. So I can’t continue playing."
},
{
"filename":"among-us-correct.mp3",
@RoxanaTapia
RoxanaTapia / aws_transcripts.json
Created November 24, 2020 15:11
Sample AWS transcripts
[
{
"filename":"among-us-correct.mp3",
"location":"https://s3.amazonaws.com/pinocchio-inputs/among-us-correct.mp3",
"media_type":"mp3",
"job":"among-us-speech-among-us-correct.mp3",
"status":"COMPLETED",
"transcript":"I was playing among us and I was the imposter. Everybody voted that I wasn't the imposter, so I can continue playing.",
"metadata":{
"jobName":"among-us-speech-among-us-correct.mp3",
@RoxanaTapia
RoxanaTapia / text_transcripts.json
Created November 24, 2020 14:58
Sample baseline output (text transcripts)
[
{
"filename":"among-us-baseline.txt",
"location":"./source/media/among-us-baseline.txt",
"media_type":"txt",
"extractor":"read",
"transcript":"I was playing Among Us and I wasn’t the imposter. Everybody voted that I was the imposter. So I can’t continue playing."
}
]
@RoxanaTapia
RoxanaTapia / config.yml
Created November 24, 2020 14:50
Pinocchio configuration
# abstraction component configurations
# plugins are paths to their respective plugins
# impl are classes that can be imported from plugins module
extract_transcripts:
plugins: extract_transcripts.plugins
impl: AWSTranscribe
# input is a directory filled with media files
input: ./source/media
# output is a directory to write the output JSON to
output: ./outputs/extract_transcripts