Skip to content

Instantly share code, notes, and snippets.

@shuuuuun
Created January 22, 2020 14:40
Show Gist options
  • Save shuuuuun/83f1d258eb3817a5f566aa8b4a27a9b5 to your computer and use it in GitHub Desktop.
Save shuuuuun/83f1d258eb3817a5f566aa8b4a27a9b5 to your computer and use it in GitHub Desktop.
Amazon Transcribe で出力されたjsonファイルから話者ごとのセンテンスを抽出してtxtファイルに保存
#!/usr/bin/env ruby
# Amazon Transcribe で出力されたjsonファイルから話者ごとのセンテンスを抽出してtxtファイルに保存
require 'json'
def convert(input_file, output_file)
input = JSON.parse(File.read(input_file))
results = input['results']
speaker_labels = results['speaker_labels']
items = results['items']
text = speaker_labels['segments'].map do |segment|
sentence = segment['items'].map do |segment_item|
items.select do |item|
segment_item['start_time'] == item['start_time']
end.map do |item|
item['alternatives'].map { |d| d['content'] }.join('')
end.join('')
end.join('')
"#{segment['speaker_label']}: #{sentence}"
end.join("\n")
File.write(output_file, text)
end
BASE_PATH = File.expand_path('.', __dir__)
input_files = Dir.glob('*.json', base: BASE_PATH).map {|path| "#{BASE_PATH}/#{path}" }
input_files.each do |input_file|
output_file = input_file.gsub(/.json\z/, '.txt')
puts "converting... #{input_file} -> #{output_file}"
convert input_file, output_file
end
puts 'おわったよ!'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment