Skip to content

Instantly share code, notes, and snippets.

@0xdevalias
Last active April 15, 2024 06:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 0xdevalias/0b64b25d72cbbc784042a9fdff713129 to your computer and use it in GitHub Desktop.
Save 0xdevalias/0b64b25d72cbbc784042a9fdff713129 to your computer and use it in GitHub Desktop.
Some notes on Singing Voice Synthesizers (eg. Vocaloid, etc)

Singing Voice Synthesizers (eg. Vocaloid, etc)

Some notes on Singing Voice Synthesizers (eg. Vocaloid, etc)

Table of Contents

Software/etc

Suno (Chirp, Bark)

Vocaloid

UTAU / OpenUTAU

  • https://en.wikipedia.org/wiki/Utau
    • UTAU is a Japanese singing synthesizer application created by Ameya/Ayame (飴屋/菖蒲). This program is similar to the VOCALOID software, with the difference being it is shareware instead of under a third party licensing.

    • OpenUTAU is an open-source unofficial successor to UTAU developed by Vocaloid producer StAkira, with a beta released in November 2021. The software was designed to be compatible with UTAU but with a modern user experience. Unlike UTAU, it does not require a Japanese system locale to function properly.

  • https://alternativeto.net/software/utau/about/
    • UTAU or Vocal Synthesizer Tool UTAU is a voice-synthesis software developed by Ameya/Ayame and made for the Windows operating system.

  • https://vocalsynth.fandom.com/wiki/UTAU
  • https://vocalsynth.fandom.com/wiki/OpenUtau
  • https://github.com/stakira/OpenUtau
    • Open singing synthesis platform / Open source UTAU successor

    • OpenUtau is a free, open-source editor made for the UTAU community.

    • https://www.openutau.com/
  • http://utau.us/
    • http://utau.us/install.html
    • http://utau.us/vb.html
      • A "voicebank" refers to the sound library. UTAU itself doesn't come with any voicebanks preinstalled besides "Defoko", a robotic voice made from the synthesis software AquesTone. However, thousands of voicebanks are able to be download and used.

    • http://utau.us/midi.html
      • UST making is a very useful skill to have in the UTAU community. It gives you the flexibility to make your UTAUloid sing what you want, when you want, allows more collaborative opportunities with other members, and lends you the ability to help those out who can’t UST themselves. UST making generally is divided into 2 parts. The MIDI, and then the making of the UST within the UTAU program. Some UST makers choose to do it all within the UTAU program, but that’s much more difficult and the majority do it by creating a MIDI file first.

    • http://utau.us/link.html

Alter/Ego

  • https://en.wikipedia.org/wiki/Alter/Ego
    • Alter/Ego is a text-to-speech synthesizer which aims to create more modern vocals, working to post 1990s research. It was offered as a free plug-in and is used for music making to produce singing vocals. It operates in a similar manner to Chipspeech. Vocals are clean-cut though robotic sounding and the software is ideal for vocal experimentation. It is capable of running different speech engines.

  • https://alternativeto.net/software/alter-ego-1/about/
    • A vocal synthesizer that can be used as a standalone app or a a VST plugin.

    • Alter/Ego is based on the award winning technology featured in chipspeech but instead of targeting Vintage voice technology, it focuses on more ‘modern’ (1990+) singing synthesis algorithms and research. It is specially tailored for musical needs – simply type in your lyrics, and then play on your MIDI keyboard. It’s a true synthesizer, the sound can be extensively modified for easy and expressive performances.

  • https://www.plogue.com/products/alter-ego.html
    • Alter/Ego :: real-time singing synthesizer Alter/Ego is based on the award winning technology featured in chipspeech but instead of targeting Vintage voice technology, it focuses on more ‘modern’ (1990+) singing synthesis algorithms and research.

      It is specially tailored for musical needs – simply type in your lyrics, and then play on your MIDI keyboard.

      It’s a true synthesizer, the sound can be extensively modified for easy and expressive performances.

    • https://www.plogue.com/products/alter-ego.html#specs_requirements
      • Windows/macOS: Standalone App, VST/VST3, AU (only on mac), ProTools AAX
    • https://www.plogue.com/products/voice-banks.html
      • Alter/Ego Voice Banks Here is the repository where you can find all the voices for Plogue Alter/Ego.

        • MARIE ORK 2: Marie Ork is a cyborg, a goblin and a witch, which explains her cybernetical-magical ability to produce so many different sounds with her voice. Her favorite human vocalist is Edith Piaf. She can be used as a virtual death metal vocalist, a source of monster voices for game and film audio, and a synthesizer which generates strange textures. Created by Karoryfer Lecolds and tora ouji.

        • ALYS: ALYS is the first French-Japanese virtual singer. She has been developed by VoxWave. She is a 21 years old young lady with dark blue hair, 165cm tall and her weight is 54kg. She can sing in both French and Japanese thanks to her integration in the voice synthesis software Alter/Ego. But ALYS universe goes beyond music only. She has been thought to break boundaries between various artistic sectors and medias. You’ll be able to make what you want of her, in the sector you want.

        • BONES: Bones is the new male voice created for Alter/Ego by tora ouji. He can sing in both English and Japanese and speaks English as well.

Synthesizer V

  • https://alternativeto.net/software/synthesizer-v/about/
    • Synthesizer V is a vocal synthesizer developed by Kanru Hua, aiming for "artistic perfection of artificial voices".

    • Introducing Synthesizer V, the Revolutionary Singing Synthesizer We aim to create a tool that ultimately leads to the artistic perfection of artificial voices, and the passion has been put into a 7-year quest for the scientific modeling of singing voice at the highest calibre. Synthesizer V, coming straight from the laboratory, is the outcome at the fifth iteration of this research project. Based on a hybrid of artificial neural networks and concatenative synthesis, Synthesizer V delivers natural voice from a small amount of sampled data. Our patent-pending Low Level Speech Model (LLSM) separately processes vocal folds and vocal tract features, thereby allowing for high-fidelity and flexible manipulation of voice timbre.

  • https://vocalsynth.fandom.com/wiki/Synthesizer_V
    • Synthesizer V is a singing synthesis software created by Dreamtonics Co., Ltd.‎

  • https://dreamtonics.com/synthesizerv/
    • A music producer’s dream, our pioneering synthesizer faithfully replicates the nuances of the human singing voice – without limiting your vocabulary. With access to customizable, realistic vocals at your fingertips, you can bring your idea to life with Synthesizer V.

    • https://synthesizerv.com/web/
      • Announcing Web Synthesizer V For the first time, a full fledged singing editor is running in the browser.

    • https://store.dreamtonics.com/product/editor-svstudio-pro/
      • Synthesizer V Studio Pro: US$89
      • System Requirements: Windows, macOS, Linux
      • Synthesizer V Studio Pro is the flagship singing synthesis software developed by Dreamtonics.

        The software combines an intuitive and flexible user interface with a powerful singing synthesis engine backed by cutting-edge technologies. Users can easily create realistic-sounding vocal covers or original songs by simply sketching out a melody and filling in the lyrics.

      • Synthesizer V Studio is available in two editions: the Pro edition (this product) and a free Basic edition that comes with voice database purchases.

        • While having limitations on the number of tracks and rendering speed, the Basic version is good for first-time users as a learning platform, or anyone who would like to try out Synthesizer V Studio before making a purchase decision.
        • The Pro version has essential features for music production: availability as a VSTi plugin, higher levels of automation brought by advanced AI features, and cross-lingual synthesis, to name a few.
      • All users of Synthesizer V Studio Pro also receive the free vocal Mai. Mai is a bright and energetic feminine pop vocalist originally recorded in Japanese, but able to sing clearly in English and Chinese using the cross lingual synthesis function. With Emotional and Soft vocal modes, Mai’s singing style can be adapted to a wide variety of productions. Like other Dreamtonics voices, there is no restriction on the monetization of songs created with Mai.

    • https://store.dreamtonics.com/product-category/voice-database/
      • Voice Database

      • These each seem to be about US$79 each

Unsorted

  • https://vocadb.net/
    • Welcome to the Vocaloid Database! The collaborative database for Vocaloid, UTAU and other singing synthesizers, with artists, discography, PVs and more.

  • https://alternativeto.net/software/vocaloid-3/
    • UTAU
      • See notes/links/etc in above sections
    • OpenUTAU
      • See notes/links/etc in above sections
    • Synthesizer V
      • See notes/links/etc in above sections
    • Alter/Ego
      • See notes/links/etc in above sections
    • Emvoice:
      • Vocal creation and arrangement tool unlike anything on the market.

      • We're building a Text-to-Voice engine that can sing and speak expressively. We're giving music producers access to the only virtual instrument they lack: the human voice. Having a vocal on a track can massively increase its commercial potential. To put it simply, successful music is almost exclusively vocal. Yet, most music producers now lack the time, money, or adequate conditions to make vocal music. Any major pop song production contains 40 to 50 vocal tracks, or more. Giving every producer access to high-quality voices has the potential to change the way music is produced, and by whom. Emvoice One has a unique ability: it can produce singing and speaking voices. It will be the testbed for our core technology.

      • https://emvoiceapp.com/
        • VST/AU/AAX - Mac/PC - Internet connection required for use - Purchase voices in-app

        • Voices seem to cost ~US$50-80ish
    • Sinsy: https://alternativeto.net/software/sinsy/about/
      • Singing voice synthesis system based on HMM algorithm

      • Sinsy generate audio files out of lyrics annotated MusicXML scores. A web service demo is available online, featuring a more recent version than the one found in the official public repository.

      • https://www.sinsy.jp/
      • https://github.com/mathigatti/midi2voice
        • Singing Synthesis from MIDI file This script relies on the sinsy.jp website from the Nagoya Institute of Technology which implements a HMM-based Singing Voice Synthesis System.

    • DeepVocal: https://alternativeto.net/software/deepvocal/about/
    • RenoidPlayer: https://alternativeto.net/software/renoidplayer/about/
      • RenoidPlayer is a web-based, free vocal synthesizer that runs on Renoise XRNI and SoundFont SF2.

      • https://www.g200kg.com/renoid/
      • https://vocadb.net/T/7621/renoid
        • Released on September 12th, 2012, Renoid is a free online-based singing synthesis engine developed by Sato-san. It can either be played within Renoise, in a different DAW via SoundFont, or used in-browser via Renoid Player. Nine voicebanks exist, four of which (Honoka Mei, Asane Bow, Nagone Mako, and Kasane Teto) were originally from UTAU and five of which (Nina, Jutero, Hana, Qurio, and Robozawa-200kg) were made directly for the program.

      • https://vocadb.net/T/8108/renoise
        • Renoise is a Digital Audio Workstation (DAW) intended for music creation. Through the use of plugins, Renoise can also support vocal synthesis such as those using Renoid voicebanks.

          • https://www.renoise.com/
            • Renoise is a complete, multi-platform and expandable Digital Audio Workstation. It lets you record, compose, edit, process and render production-quality audio using a music-tracker based approach.

    • Chipspeech: https://alternativeto.net/software/chipspeech/about/
      • chipspeech is a vocal synthesizer that recreates vintage vocal synths from the 1980s, developed by Plogue.

      • chipspeech is a vintage-style speech synthesizer which recreates the sound of famous 80's voice synthesis chips. It features 12 different voices, each with its own characteristic timbre. It is specially tailored for musical needs – simply type in your lyrics, and then play on your MIDI keyboard. It’s a true synthesizer, the sound can be extensively modified for easy and expressive performances. chipspeech also features a circuit bending emulation, letting you not only recreate the insane and chaotic sound of a circuit bent TI speaking device, but also use it in a controlled, musical way.

    • eCantorix: https://alternativeto.net/software/ecantorix/about/
      • eCantorix is a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.

      • https://github.com/divVerent/ecantorix
        • Singing synthesis frontend for espeak

        • eCantorix is a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.

      • https://espeak.sourceforge.net/
        • eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

      • https://github.com/espeak-ng/espeak-ng
        • eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

        • The eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington. eSpeak NG uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. It also supports Klatt formant synthesis, and the ability to use MBROLA as backend speech synthesizer.

  • https://www.reddit.com/r/opensource/comments/lrzq3s/is_there_a_foss_equivalent_to_vocaloidutau/
    • Sinsy
      • See notes/links/etc in above sections
    • OpenUtau
      • See notes/links/etc in above sections
  • https://www.quora.com/Is-there-a-cheap-maybe-open-source-alternative-to-vocaloid-voice-synthesizer
    • UTAU
      • See notes/links/etc in above sections
    • Alter/Ego
      • See notes/links/etc in above sections
    • CEVIO Creative Studio
      • See notes/links/etc in above sections
    • Synthesizer V
      • See notes/links/etc in above sections
    • Sharpkey: https://deepvocal.fandom.com/wiki/Sharpkey
      • Sharpkey was a singing synthesizer application software developed by Boxstar. It was replaced by the Sharpkey Galaxy software, which was then replaced by the now modern DeepVocal.

    • MUTA
      • https://vocadb.net/T/3247/muta
        • MUTA is a discontinued Chinese synthesizer developed by 厦门优他动漫科技有限公司. As a whole, the program had five vocals: Yan Xi, Hupo Xu Yan, Weiyang, Liliko, and Xin Ge Ping; the latter two were only available through the mobile app. Three more voicebanks had been announced (Xi Sheng, Li Xiang, and You Hui), but only Xi Sheng ever received demos. MUTA has since ended development.

      • https://muta.fandom.com/wiki/MUTA_Wiki
        • MUTA is a Chinese voice synthesizing program that was developed and released by AI-MUTA in 2015. The synthesizer allows anyone to create music from their computer in seconds!

    • NIAOniao: https://vocalsynth.fandom.com/wiki/NIAONiao
      • NIAONiao Virtual Singer (袅袅虚拟歌手, Niǎoniǎo Xūnǐ Gēshǒu) is a Chinese voice synthesizer program developed by dsound. The default voicebank is named Yu Niaoniao (余袅袅), however, users can create their own voicebank and take advantage of its larger file feature. NIAONiao can import MIDI files, VSQX files (VOCALOID3 only), and UST files, export tracks as the "Niao" file format (*.nn), and can render vocal tracks directly as WAV, MP3, or MIDI files.

    • VOCALINA: https://vocalsynth.fandom.com/wiki/VOCALINA
      • VOCALINA (보카리나) is a "text to speech" singing synthesizer and DAW for personal music-related content. It was developed in October 2011 by TGENS Co., Ltd. On September 25, 2017, it was announced by the CEO (Kang Woo-Mo) that VOCALINA's Service would be terminated on October 1, 2017. Contrary to this, on the 29th of September in 2017, the CEO announced they would work to extend the service for one more year, as thanks to the community for their encouragement and support. Additionally, VOCALINA would be free to use until the service expired.

    • DeepVocal
      • See notes/links/etc in above sections
    • Nakloid
      • See notes/links/etc in above sections
    • Cadencii
      • See notes/links/etc in above sections
  • https://vocalsynth.fandom.com/wiki/Vocal_Synthesizer_Wiki
    • Lists a few synths on the front page:
      • AITalk: https://vocalsynth.fandom.com/wiki/AITalk
        • AITalk (エーアイトーク) is a speech synthesis program created by AI, Inc.. It has been used as the base for VOICEROID, galaco Talk, Otomachi Una Talk EX, Gynoid Talk, and A.I.VOICE.

      • AquesTone: https://vocalsynth.fandom.com/wiki/AquesTone
        • AquesTone is a VSTi plugin developed by Aquest, there are four voice options: Female F1, Auto F1, Male HK, and Auto HK. UTAU Utane Uta (aka. Defoko) uses Female voice 1 as a source for the UTAU default voicebank.

      • CANTOR: https://vocalsynth.fandom.com/wiki/CANTOR
      • CeVIO: https://vocalsynth.fandom.com/wiki/CeVIO
        • CeVIO is a speech and singing synthesis software created by Frontier Works, Inc

        • https://cevio.jp/
        • CeVIO Creative Studio: https://vocalsynth.fandom.com/wiki/CeVIO_Creative_Studio
          • CeVIO Creative Studio is a program created by Frontier Works, Inc.. It is capable of both speaking and singing.

        • CeVIO AI: https://vocalsynth.fandom.com/wiki/CeVIO_AI
          • CeVIO AI is the successor to CeVIO Creative Studio. It was announced in December 2018 and is capable of both speaking and singing and utilizes deep learning technology.

        • CeVIO Pro (aka VoiSona): https://vocalsynth.fandom.com/wiki/VoiSona
          • VoiSona, formerly CeVIO Pro (チェビオ Pro), is a digital audio workstation (DAW), VSTi-compatible, commercial vocal synthesizer product capable of both speech and singing synthesis, and is the next generation of the CeVIO voice synthesis technology developed by the CeVIO Project in collaboration with Techno-Speech, Inc.. The alpha (α) version of CeVIO Pro is slated to be released on February 24, 2022 for free download.

          • https://voisona.com/
            • Free AI Singing Software. VoiSona AI voice synthesis software realistically reproduces human singing. Developed and refined over many years, its AI technology produces superior synthesized voice quality. Available on both Windows and macOS with VSTi/Audio Units (AU) support, VoiSona is now easier to use and supports the needs of professionals. VoiSona’s first artist is Chis-A. She is a magnetic singer with an androgynous voice. Chis-A is included in VoiSona’s default voice library under a user-friendly license. You will find her valuable to your creative endeavors.

            • CeVIO Creative Studio- and CeVIO AI-exclusive voice libraries cannot be used with VoiSona at present. Cross-platform support for each voice library is being considered.

            • VoiSona is available on Windows and macOS. You can add VoiSona to your DAW as a VSTi plug-in or an Audio Unit (AU) plug-in. Use VoiSona in your DAW for all processes from inputting score data to adjusting the singing voice to mixing. VoiSona is also available as a standalone application for use without a DAW.

          • https://cevio.fandom.com/wiki/VoiSona
            • VoiSona, formerly known as CeVIO Pro (チェビオ Pro (仮), is an audio workstation (DAW), VSTi-compatible, commercial vocal synthesizer software that reproduces realistic singing voices with AI technology, and is the sister brand of the CeVIO voice synthesis technology developed by Techno-Speech, Inc. in collaboration with CeVIO Project. The alpha (α) version known as CeVIO Pro was released on February 24, 2022 for free download.[1] The beta (β) version was released on June 2, 2022 under the new name "VoiSona", with the official production version released on September 1, 2022.

      • DeepVocal
        • See notes/links/etc in above sections
      • ETHERA
        • TODO
      • NEUTRINO: https://vocalsynth.fandom.com/wiki/NEUTRINO
        • NEUTRINO is a Japanese neural voice synthesizer program developed by SHACHI. It is compatible Windows, MacOS, and Linux. Web browser compatibility is based in Google Drive.

        • The user uploads data in the MusicXML format, which the NEUTRINO program reads to output a WAV file of the generated voice. Gender factor, vibrato intensity, and pitch shift can be adjusted prior to output.

        • https://studio-neutrino.com/
          • NEUTRINO Diffusion: AI Singing Voice Generator

          • https://studio-neutrino.com/#library
            • NEUTRINO SINGER LIBRARY A unique singing voice library As of July 2023, we have published the singing voice libraries of 13 people.

          • https://studio-neutrino.com/blog/
            • https://studio-neutrino.com/1591/
              • NEUTRINO Diffusion – Muon v2.x update One year has passed since the official release of NEUTRINO, and it has evolved into the second generation (Muon). We have carried out a complete renewal of the algorithm and model. The second generation (Muon) uses the latest generation AI model, the Diffusion model, which makes it possible to generate voices with a higher sense of realism and rich singing expression. It has also been improved in terms of functionality, with the voice changing every time it makes an inference, and the quality and processing speed can be changed. I hope this will be of help to you in your production.

            • https://studio-neutrino.com/535/
              • NEUTRINO Diffusion Download

              • All versions, including past versions, can be downloaded from this page. When using, after downloading the NEUTRINO main unit, please download the singing voice library separately and decompress and copy it.

      • Piapro Studio: https://vocalsynth.fandom.com/wiki/Piapro_Studio
        • Piapro Studio (ピアプロスタジオ) is a VSTi produced and developed by Crypton Future Media, Inc.. It was meant for use in a DAW (digital audio workstation), but has since received standalone releases. Piapro Studio was first packaged with KAITO V3 and has been released with most every Crypton release since. Piapro Studio originally used VOCALOID as its core synthesizing engine, but later went onto become its own independent software.

        • https://piaprostudio.com/
      • Synthesizer V
        • See notes/links/etc in above sections
      • UTAU
        • See notes/links/etc in above sections
      • VOCALOID
        • See notes/links/etc in above sections
    • https://vocalsynth.fandom.com/wiki/Category:Software
      • Has 345 entries at time of writing
    • https://vocalsynth.fandom.com/wiki/Category:Open_source
      • VOICEVOX: https://vocalsynth.fandom.com/wiki/VOICEVOX
        • VOICEVOX is an open source, deep learning, reading/text-to-speech synthesizer software developed by Hiho.

      • OpenUtau
        • See notes/links/etc in above sections
      • TALQu: https://vocalsynth.fandom.com/wiki/TALQu
        • TALQu (とーく) is an open source, deep learning, reading/text-to-speech synthesizer software developed by Haruqa Software (Haruqaのソフトウェアとか).

      • ALYS: https://vocalsynth.fandom.com/wiki/ALYS
        • ALYS is vocal for Alter/Ego. She is the second vocal produced for the software, the first French vocal, and was developed by VoxWave. In addition to her French vocal she also has a Japanese vocal. ALYS' voice is provided by the French Utaite, Poucet, and is illustrated by Saphirya.[1] In December 2021, ALYS became a open-source project and both her Alter/Ego aswell as her prototype UTAU voicebanks were released.

  • https://github.com/mathigatti/RealTimeSingingSynthesizer
    • Live Coding Singing Synthesizer Real Time Singing Synthesizer project made from sinsy-NG. The idea was to generate vocal audio samples on real time easily for live coding performances.

    • https://github.com/Aozhi/Sinsy-NG
      • The HMM-Based English&Japanese&Chinese Singing Synthesis with espeak-ng

    • https://github.com/GloomyGrave/Sinsy-NG
      • (discontinued) 🎵The Formant-Based All Language Singing Voice Syntheis System: Sinsy-NG

  • https://github.com/topics/vocaloid
  • https://github.com/facebookresearch/audiocraft
    • Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

  • https://arxiv.org/abs/2301.11325
    • MusicLM: Generating Music From Text

See Also

My Other Related Deepdive Gist's and Projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment