Skip to content

Instantly share code, notes, and snippets.

@nzlz
Created March 18, 2023 06:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nzlz/2341659d01f34d67361592e8f00a64fc to your computer and use it in GitHub Desktop.
Save nzlz/2341659d01f34d67361592e8f00a64fc to your computer and use it in GitHub Desktop.
# pip3 install git+https://github.com/espnet/espnet
# pip3 install "espnet[all]" <--- deps, not sure if espnet needs to be reinstalled from source after this, just make sure its March1 or later release
# pip3 install pyopenjtalk
import pyaudio
from espnet2.bin.tts_inference import Text2Speech
import numpy as np
import soundfile as sf
from pydub import AudioSegment, playback
# Initialize the pre-trained TTS model
text2speech = Text2Speech.from_pretrained("espnet/kan-bayashi_tsukuyomi_full_band_vits_prosody")
# Set the audio sample rate to 24000Hz
sample_rate = 24000
while True:
# Take user input for Japanese text to generate speech from
text_input = input("Enter Japanese text to generate speech: ")
# Generate speech from the input text using the pre-trained TTS model
wav = text2speech(text_input)["wav"]
# Save
sf.write("/tmp/out.wav", wav.numpy(), text2speech.fs, "PCM_16")
# Play
audio_file = AudioSegment.from_file("/tmp/out.wav", format="wav")
playback.play(audio_file)
@nzlz
Copy link
Author

nzlz commented Mar 18, 2023

AbeShinzo0708/ESPnet_VITS_SugaYoshihide

JP prime minister

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment