coding-youtuber/daihon.txt

## daihon.txt
こんにちは。エンジニアの直也です。
今日は音声合成をします。
動画の音声収録が大変なのでプログラミングを使って作業効率化を行います
音声合成にはGoogleのCloud Text to Speechを使います
音声合成はGoogleのサイトから簡単に試せるのでぜひやってみてください。
このサービスは有料です
しかし100万文字あたり16ドルなので安いと思います。
こちらがText To Speechのドキュメントです
このドキュメントを読めばできます
まずクイックスタートから環境設定を行います
最初にプロジェクトの作成をします
プロジェクトの作成ボタンを押してください
必要な項目を入力してください
こちらが管理画面です
支払い方法の設定をする必要があります
支払い方法ができていない人は設定をしてください
個人情報入力、クレジットカード連携が必要です
次にAPIを有効にします
次にサービスアカウントキーの作成をします
JSONを選択してください
ロールは選択しなくてよいです
次に環境変数を設定します
まずダウンロードしたjsonを作業ディレクトリに移動させます
次にbashやzshの設定ファイルに環境変数の設定を追記します
次はPythonの環境設定です
python3.6.2で最初設定します
pyenvの切り替えを行います
pip installのコマンドをコピーして実行します
エラーが出ました
opensslのエラーみたいです
いろいろやりましたが結局pythonのバージョンを新しくして解決しました
実際のコードを書いていきます
ドキュメントに掲載されているコードをコピーします
main.pyを作成しペーストします
実行すると音声が出力されました
今度は日本語を出力させます
次に声の種類を変えます
サポートされているすべての音声の一覧表示を行います
またコードをコピペします
実行すると音声の情報が出力されました。これはあとで使います。
次に複数の文を出力します
これは繰り返し処理とリストを使って行います
コードを書き換えて実行します
出力結果はこちらです
音声合成は大きく二種類あってBasicとWaveNetです
WaveNetのほうが新しい技術で性能が良いです
先ほどの音声一覧から日本語音声のWaveNetの名前を調べます
WaveNetで出力します
結果の違いはよくわかりませんでした
しかしこれは簡単な文です
もっと長文になると違いが出てくると思います
今日の動画がおもしろい、ためになったと思った方はチャンネル登録をよろしくお願い致します
質問がある方はコメントにぜひしてください
では次回の動画でお会いしましょう。

## main.py
"""Synthesizes speech from the input string of text or ssml.

Note: ssml must be well-formed according to:
    https://www.w3.org/TR/speech-synthesis/
"""

def make_voice(index, text):
    from google.cloud import texttospeech

    # Instantiates a client
    client = texttospeech.TextToSpeechClient()
    # voice_name = "ja-JP-Standard-D"
    voice_name = "ja-JP-Wavenet-D"

    # Set the text input to be synthesized
    synthesis_input = texttospeech.types.SynthesisInput(text=text)

    # Build the voice request, select the language code ("en-US") and the ssml
    # voice gender ("neutral")
    voice = texttospeech.types.VoiceSelectionParams(
        name=voice_name,
        language_code='ja-JP',
        ssml_gender=texttospeech.enums.SsmlVoiceGender.MALE)

    # Select the type of audio file you want returned
    audio_config = texttospeech.types.AudioConfig(
        audio_encoding=texttospeech.enums.AudioEncoding.MP3
        )

    # Perform the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(synthesis_input, voice, audio_config)

    # The response's audio_content is binary.
    f_name = 'outputs/{}_{}.mp3'.format(index, text)
    with open(f_name, 'wb') as out:
        # Write the response to the output file.
        out.write(response.audio_content)
        print('Audio content written to file {}'.format(f_name))

def list_voices():
    """Lists the available voices.
    Name: ja-JP-Wavenet-A
    Supported language: ja-JP
    SSML Voice Gender: FEMALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Wavenet-B
    Supported language: ja-JP
    SSML Voice Gender: FEMALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Wavenet-C
    Supported language: ja-JP
    SSML Voice Gender: MALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Wavenet-D
    Supported language: ja-JP
    SSML Voice Gender: MALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Standard-A
    Supported language: ja-JP
    SSML Voice Gender: FEMALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Standard-C
    Supported language: ja-JP
    SSML Voice Gender: MALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Standard-B
    Supported language: ja-JP
    SSML Voice Gender: FEMALE
    Natural Sample Rate Hertz: 24000

    Name: ja-JP-Standard-D
    Supported language: ja-JP
    SSML Voice Gender: MALE
    Natural Sample Rate Hertz: 24000
    """
    from google.cloud import texttospeech
    from google.cloud.texttospeech import enums
    client = texttospeech.TextToSpeechClient()

    # Performs the list voices request
    voices = client.list_voices()

    for voice in voices.voices:
        # Display the voice's name. Example: tpc-vocoded

        if voice.language_codes[0] != "ja-JP":
            continue

        print('Name: {}'.format(voice.name))

        # Display the supported language codes for this voice. Example: "en-US"
        for language_code in voice.language_codes:
            print('Supported language: {}'.format(language_code))

        ssml_gender = enums.SsmlVoiceGender(voice.ssml_gender)

        # Display the SSML Voice Gender
        print('SSML Voice Gender: {}'.format(ssml_gender.name))

        # Display the natural sample rate hertz for this voice. Example: 24000
        print('Natural Sample Rate Hertz: {}\n'.format(
            voice.natural_sample_rate_hertz))

if __name__ == '__main__':
    # list_voices()

    with open("daihon.txt", "r") as f:
        sentences = [x.strip() for x in f.readlines()]
        # print(sentences)
        # sentences = ["こんにちは", "エンジニアの直也です", "今日はプログラミングをします。"]

        for index, s in enumerate(sentences):
            make_voice(index=index, text=s)
            # break
	こんにちは。エンジニアの直也です。
	今日は音声合成をします。
	動画の音声収録が大変なのでプログラミングを使って作業効率化を行います
	音声合成にはGoogleのCloud Text to Speechを使います
	音声合成はGoogleのサイトから簡単に試せるのでぜひやってみてください。
	このサービスは有料です
	しかし100万文字あたり16ドルなので安いと思います。
	こちらがText To Speechのドキュメントです
	このドキュメントを読めばできます
	まずクイックスタートから環境設定を行います
	最初にプロジェクトの作成をします
	プロジェクトの作成ボタンを押してください
	必要な項目を入力してください
	こちらが管理画面です
	支払い方法の設定をする必要があります
	支払い方法ができていない人は設定をしてください
	個人情報入力、クレジットカード連携が必要です
	次にAPIを有効にします
	次にサービスアカウントキーの作成をします
	JSONを選択してください
	ロールは選択しなくてよいです
	次に環境変数を設定します
	まずダウンロードしたjsonを作業ディレクトリに移動させます
	次にbashやzshの設定ファイルに環境変数の設定を追記します
	次はPythonの環境設定です
	python3.6.2で最初設定します
	pyenvの切り替えを行います
	pip installのコマンドをコピーして実行します
	エラーが出ました
	opensslのエラーみたいです
	いろいろやりましたが結局pythonのバージョンを新しくして解決しました
	実際のコードを書いていきます
	ドキュメントに掲載されているコードをコピーします
	main.pyを作成しペーストします
	実行すると音声が出力されました
	今度は日本語を出力させます
	次に声の種類を変えます
	サポートされているすべての音声の一覧表示を行います
	またコードをコピペします
	実行すると音声の情報が出力されました。これはあとで使います。
	次に複数の文を出力します
	これは繰り返し処理とリストを使って行います
	コードを書き換えて実行します
	出力結果はこちらです
	音声合成は大きく二種類あってBasicとWaveNetです
	WaveNetのほうが新しい技術で性能が良いです
	先ほどの音声一覧から日本語音声のWaveNetの名前を調べます
	WaveNetで出力します
	結果の違いはよくわかりませんでした
	しかしこれは簡単な文です
	もっと長文になると違いが出てくると思います
	今日の動画がおもしろい、ためになったと思った方はチャンネル登録をよろしくお願い致します
	質問がある方はコメントにぜひしてください
	では次回の動画でお会いしましょう。
	"""Synthesizes speech from the input string of text or ssml.

	Note: ssml must be well-formed according to:
	https://www.w3.org/TR/speech-synthesis/
	"""

	def make_voice(index, text):
	from google.cloud import texttospeech

	# Instantiates a client
	client = texttospeech.TextToSpeechClient()
	# voice_name = "ja-JP-Standard-D"
	voice_name = "ja-JP-Wavenet-D"

	# Set the text input to be synthesized
	synthesis_input = texttospeech.types.SynthesisInput(text=text)

	# Build the voice request, select the language code ("en-US") and the ssml
	# voice gender ("neutral")
	voice = texttospeech.types.VoiceSelectionParams(
	name=voice_name,
	language_code='ja-JP',
	ssml_gender=texttospeech.enums.SsmlVoiceGender.MALE)

	# Select the type of audio file you want returned
	audio_config = texttospeech.types.AudioConfig(
	audio_encoding=texttospeech.enums.AudioEncoding.MP3
	)

	# Perform the text-to-speech request on the text input with the selected
	# voice parameters and audio file type
	response = client.synthesize_speech(synthesis_input, voice, audio_config)

	# The response's audio_content is binary.
	f_name = 'outputs/{}_{}.mp3'.format(index, text)
	with open(f_name, 'wb') as out:
	# Write the response to the output file.
	out.write(response.audio_content)
	print('Audio content written to file {}'.format(f_name))

	def list_voices():
	"""Lists the available voices.
	Name: ja-JP-Wavenet-A
	Supported language: ja-JP
	SSML Voice Gender: FEMALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Wavenet-B
	Supported language: ja-JP
	SSML Voice Gender: FEMALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Wavenet-C
	Supported language: ja-JP
	SSML Voice Gender: MALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Wavenet-D
	Supported language: ja-JP
	SSML Voice Gender: MALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Standard-A
	Supported language: ja-JP
	SSML Voice Gender: FEMALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Standard-C
	Supported language: ja-JP
	SSML Voice Gender: MALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Standard-B
	Supported language: ja-JP
	SSML Voice Gender: FEMALE
	Natural Sample Rate Hertz: 24000

	Name: ja-JP-Standard-D
	Supported language: ja-JP
	SSML Voice Gender: MALE
	Natural Sample Rate Hertz: 24000
	"""
	from google.cloud import texttospeech
	from google.cloud.texttospeech import enums
	client = texttospeech.TextToSpeechClient()

	# Performs the list voices request
	voices = client.list_voices()

	for voice in voices.voices:
	# Display the voice's name. Example: tpc-vocoded

	if voice.language_codes[0] != "ja-JP":
	continue

	print('Name: {}'.format(voice.name))

	# Display the supported language codes for this voice. Example: "en-US"
	for language_code in voice.language_codes:
	print('Supported language: {}'.format(language_code))

	ssml_gender = enums.SsmlVoiceGender(voice.ssml_gender)

	# Display the SSML Voice Gender
	print('SSML Voice Gender: {}'.format(ssml_gender.name))

	# Display the natural sample rate hertz for this voice. Example: 24000
	print('Natural Sample Rate Hertz: {}\n'.format(
	voice.natural_sample_rate_hertz))

	if __name__ == '__main__':
	# list_voices()

	with open("daihon.txt", "r") as f:
	sentences = [x.strip() for x in f.readlines()]
	# print(sentences)
	# sentences = ["こんにちは", "エンジニアの直也です", "今日はプログラミングをします。"]

	for index, s in enumerate(sentences):
	make_voice(index=index, text=s)
	# break