Skip to content

Instantly share code, notes, and snippets.

@chezou
Last active May 11, 2019 16:35
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save chezou/9919f5065cbc52f8d0349d3084ac3616 to your computer and use it in GitHub Desktop.
Save chezou/9919f5065cbc52f8d0349d3084ac3616 to your computer and use it in GitHub Desktop.
Example code of ruby with Amazon Polly
source 'https://rubygems.org'
gem 'nokogiri', '~>1.6'
gem 'aws-sdk', '~> 2'
require 'aws-sdk'
require 'nokogiri'
require 'open-uri'
class Synthesizer
def initialize(region='us-east-1')
@polly = Aws::Polly::Client.new(region: region)
end
def synthesize(text, file_name="./tmp.mp3", voice_id="Joanna")
@polly.synthesize_speech(
response_target: file_name,
text: text,
output_format: "mp3",
# You can use voice IDs http://docs.aws.amazon.com/polly/latest/dg/API_Voice.html
# If you want to synthesize Japanese voice, you can use "Mizuki"
voice_id: voice_id
)
end
end
module TextFetcher
def self.fetch_text_from(url, xpath)
charset = nil
html = open(url) do |f|
charset = f.charset
f.read
end
doc = Nokogiri::HTML.parse(html, nil, charset)
node_texts = doc.xpath(xpath).map(&:text)
combined_texts = []
tmp_string = ""
node_texts.each do |text|
if tmp_string.size + text.size > 1500
combined_texts << tmp_string
tmp_string = text
else
tmp_string << " #{text}"
end
end
combined_texts << tmp_string
end
end
if __FILE__ == $0
synthesizer = Synthesizer.new
url = "https://medium.com/@chezou/building-predictive-model-with-ibis-impala-and-scikit-learn-356b41f404e0#.xeiwrmhhb"
# This XPath assumes medium contents
xpath = '//section//text()'
input_texts = TextFetcher.fetch_text_from(url, xpath)
input_texts.each.with_index do |text, i|
synthesizer.synthesize(text, "./tmp_#{i}.mp3")
sleep(1)
end
# You can combine mp3 with cat on Linux based system
`cat ./tmp_*.mp3 > ./combined.mp3`
end
@mkuendig
Copy link

Thanks a lot for this. Have used the snippets and I think in the do iteration over the array where you construct the 1500 bytes chunks it has an error. In my case it repeats certain sentences. Can you please review?

@mkuendig
Copy link

ok, found the duplication issue with help from stackoverflow: Correct is.

node_texts.each do |text|
      if tmp_string.size + text.size > 1500
        combined_texts << tmp_string
        tmp_string = ""
      end
      tmp_string << " #{text}"
    end
    combined_texts << tmp_string
  end

or even better:

if tmp_string.size + text.size > 1500
  combined_texts << tmp_string
  tmp_string = text
else
  tmp_string << " #{text}"
end

@chezou
Copy link
Author

chezou commented Jan 8, 2017

@mkuendig Thanks for your patch! I fixed it :)

@chezou
Copy link
Author

chezou commented Jan 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment