0xdevalias/singing-voice-synthesizers.md

## singing-voice-synthesizers.md

      
    Raw
  

              singing-voice-synthesizers.md
            
          
    Singing Voice Synthesizers (eg. Vocaloid, etc)

Some notes on Singing Voice Synthesizers (eg. Vocaloid, etc)
Table of Contents


Software/etc

Suno (Chirp, Bark)
Vocaloid
UTAU / OpenUTAU
Alter/Ego
Synthesizer V


Unsorted
See Also

My Other Related Deepdive Gist's and Projects


Software/etc

Suno (Chirp, Bark)


https://www.suno.ai/


Create music and speech with AI


Our AI models enable creatives and developers to generate hyper-realistic speech, music and sound effects — powering personalized, interactive and fun experiences across gaming, social media, entertainment and more.


https://suno.ai/discord
Chirp v1 examples: https://suno-ai.notion.site/Chirp-v1-Examples-cc71e6c0c79f4e03acf39aa5d5a3dd09
Chirp v0 examples: https://suno-ai.notion.site/Chirp-v0-Examples-f05351485da74d769d6183220a6e5da7
Bark v0 examples: https://suno-ai.notion.site/Bark-v0-Examples-e572bcfcdf65429c916d4c6dd8ae175b


https://github.com/suno-ai/bark


Text-Prompted Generative Audio Model


Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.


2023.05.01

Bark is now licensed under the MIT License, meaning it's now available for commercial use!


Bark Speaker Library (v2): https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c


Vocaloid


https://vocalsynth.fandom.com/wiki/VOCALOID


VOCALOID (ボーカロイド) is a singing synthesis software created by YAMAHA Corporation‎.


https://www.vocaloid.com/en/

https://www.vocaloid.com/en/products

https://www.vocaloid.com/en/vocaloid6/buy/

US$225, Windows/macOS, includes 8 voices, VST/AU/ARA2
Vocaloid 6 Walkthrough: https://www.youtube.com/watch?v=Zc3h9iKYKDs
https://www.vocaloid.com/en/vocaloid6/trial/


Download VOCALOID6 now to experience this amazing world of vocal sounds for yourself! The free trial version lets you enjoy all of VOCALOID6’s features for 31 days, available now.


https://www.vocaloid.com/en/learn/


UTAU / OpenUTAU


https://en.wikipedia.org/wiki/Utau


UTAU is a Japanese singing synthesizer application created by Ameya/Ayame (飴屋／菖蒲). This program is similar to the VOCALOID software, with the difference being it is shareware instead of under a third party licensing.


OpenUTAU is an open-source unofficial successor to UTAU developed by Vocaloid producer StAkira, with a beta released in November 2021. The software was designed to be compatible with UTAU but with a modern user experience. Unlike UTAU, it does not require a Japanese system locale to function properly.


https://alternativeto.net/software/utau/about/


UTAU or Vocal Synthesizer Tool UTAU is a voice-synthesis software developed by Ameya/Ayame and made for the Windows operating system.


https://vocalsynth.fandom.com/wiki/UTAU
https://vocalsynth.fandom.com/wiki/OpenUtau
https://github.com/stakira/OpenUtau


Open singing synthesis platform / Open source UTAU successor


OpenUtau is a free, open-source editor made for the UTAU community.


https://www.openutau.com/


http://utau.us/

http://utau.us/install.html
http://utau.us/vb.html


A "voicebank" refers to the sound library. UTAU itself doesn't come with any voicebanks preinstalled besides "Defoko", a robotic voice made from the synthesis software AquesTone. However, thousands of voicebanks are able to be download and used.


http://utau.us/midi.html


UST making is a very useful skill to have in the UTAU community.
It gives you the flexibility to make your UTAUloid sing what you want, when you want, allows more collaborative opportunities with other members, and lends you the ability to help those out who can’t UST themselves.
UST making generally is divided into 2 parts. The MIDI, and then the making of the UST within the UTAU program. Some UST makers choose to do it all within the UTAU program, but that’s much more difficult and the majority do it by creating a MIDI file first.


http://utau.us/link.html


Alter/Ego


https://en.wikipedia.org/wiki/Alter/Ego


Alter/Ego is a text-to-speech synthesizer which aims to create more modern vocals, working to post 1990s research. It was offered as a free plug-in and is used for music making to produce singing vocals. It operates in a similar manner to Chipspeech. Vocals are clean-cut though robotic sounding and the software is ideal for vocal experimentation. It is capable of running different speech engines.


https://alternativeto.net/software/alter-ego-1/about/


A vocal synthesizer that can be used as a standalone app or a a VST plugin.


Alter/Ego is based on the award winning technology featured in chipspeech but instead of targeting Vintage voice technology, it focuses on more ‘modern’ (1990+) singing synthesis algorithms and research.
It is specially tailored for musical needs – simply type in your lyrics, and then play on your MIDI keyboard.
It’s a true synthesizer, the sound can be extensively modified for easy and expressive performances.


https://www.plogue.com/products/alter-ego.html


Alter/Ego :: real-time singing synthesizer
Alter/Ego is based on the award winning technology featured in chipspeech but instead of targeting Vintage voice technology, it focuses on more ‘modern’ (1990+) singing synthesis algorithms and research.
It is specially tailored for musical needs – simply type in your lyrics, and then play on your MIDI keyboard.
It’s a true synthesizer, the sound can be extensively modified for easy and expressive performances.


https://www.plogue.com/products/alter-ego.html#specs_requirements

Windows/macOS: Standalone App, VST/VST3, AU (only on mac), ProTools AAX


https://www.plogue.com/products/voice-banks.html


Alter/Ego Voice Banks
Here is the repository where you can find all the voices for Plogue Alter/Ego.


MARIE ORK 2: Marie Ork is a cyborg, a goblin and a witch, which explains her cybernetical-magical ability to produce so many different sounds with her voice. Her favorite human vocalist is Edith Piaf. She can be used as a virtual death metal vocalist, a source of monster voices for game and film audio, and a synthesizer which generates strange textures. Created by Karoryfer Lecolds and tora ouji.


ALYS: ALYS is the first French-Japanese virtual singer. She has been developed by VoxWave. She is a 21 years old young lady with dark blue hair, 165cm tall and her weight is 54kg. She can sing in both French and Japanese thanks to her integration in the voice synthesis software Alter/Ego. But ALYS universe goes beyond music only. She has been thought to break boundaries between various artistic sectors and medias. You’ll be able to make what you want of her, in the sector you want.


BONES: Bones is the new male voice created for Alter/Ego by tora ouji. He can sing in both English and Japanese and speaks English as well.


Synthesizer V


https://alternativeto.net/software/synthesizer-v/about/


Synthesizer V is a vocal synthesizer developed by Kanru Hua, aiming for "artistic perfection of artificial voices".


Introducing Synthesizer V, the Revolutionary Singing Synthesizer
We aim to create a tool that ultimately leads to the artistic perfection of artificial voices, and the passion has been put into a 7-year quest for the scientific modeling of singing voice at the highest calibre. Synthesizer V, coming straight from the laboratory, is the outcome at the fifth iteration of this research project.
Based on a hybrid of artificial neural networks and concatenative synthesis, Synthesizer V delivers natural voice from a small amount of sampled data. Our patent-pending Low Level Speech Model (LLSM) separately processes vocal folds and vocal tract features, thereby allowing for high-fidelity and flexible manipulation of voice timbre.


https://vocalsynth.fandom.com/wiki/Synthesizer_V


Synthesizer V is a singing synthesis software created by Dreamtonics Co., Ltd.‎


https://dreamtonics.com/synthesizerv/


A music producer’s dream, our pioneering synthesizer faithfully replicates the nuances of the human singing voice – without limiting your vocabulary. With access to customizable, realistic vocals at your fingertips, you can bring your idea to life with Synthesizer V.


https://synthesizerv.com/web/


Announcing Web Synthesizer V
For the first time, a full fledged singing editor is running in the browser.


https://store.dreamtonics.com/product/editor-svstudio-pro/

Synthesizer V Studio Pro: US$89
System Requirements: Windows, macOS, Linux


Synthesizer V Studio Pro is the flagship singing synthesis software developed by Dreamtonics.
The software combines an intuitive and flexible user interface with a powerful singing synthesis engine backed by cutting-edge technologies. Users can easily create realistic-sounding vocal covers or original songs by simply sketching out a melody and filling in the lyrics.


Synthesizer V Studio is available in two editions: the Pro edition (this product) and a free Basic edition that comes with voice database purchases.

While having limitations on the number of tracks and rendering speed, the Basic version is good for first-time users as a learning platform, or anyone who would like to try out Synthesizer V Studio before making a purchase decision.
The Pro version has essential features for music production: availability as a VSTi plugin, higher levels of automation brought by advanced AI features, and cross-lingual synthesis, to name a few.


All users of Synthesizer V Studio Pro also receive the free vocal Mai. Mai is a bright and energetic feminine pop vocalist originally recorded in Japanese, but able to sing clearly in English and Chinese using the cross lingual synthesis function. With Emotional and Soft vocal modes, Mai’s singing style can be adapted to a wide variety of productions. Like other Dreamtonics voices, there is no restriction on the monetization of songs created with Mai.


https://resource.dreamtonics.com/download/English/Voice%20Databases/Free%20Voice%20Databases%20for%20Synthesizer%20V%20Studio%20Pro/


https://store.dreamtonics.com/product-category/voice-database/


Voice Database


These each seem to be about US$79 each


Unsorted


https://vocadb.net/


Welcome to the Vocaloid Database! The collaborative database for Vocaloid, UTAU and other singing synthesizers, with artists, discography, PVs and more.


https://alternativeto.net/software/vocaloid-3/

UTAU

See notes/links/etc in above sections


OpenUTAU

See notes/links/etc in above sections


Synthesizer V

See notes/links/etc in above sections


Alter/Ego

See notes/links/etc in above sections


Emvoice:


Vocal creation and arrangement tool unlike anything on the market.


We're building a Text-to-Voice engine that can sing and speak expressively. We're giving music producers access to the only virtual instrument they lack: the human voice.
Having a vocal on a track can massively increase its commercial potential. To put it simply, successful music is almost exclusively vocal. Yet, most music producers now lack the time, money, or adequate conditions to make vocal music. Any major pop song production contains 40 to 50 vocal tracks, or more. Giving every producer access to high-quality voices has the potential to change the way music is produced, and by whom.
Emvoice One has a unique ability: it can produce singing and speaking voices. It will be the testbed for our core technology.


https://emvoiceapp.com/


VST/AU/AAX - Mac/PC - Internet connection required for use - Purchase voices in-app


Voices seem to cost ~US$50-80ish


Sinsy: https://alternativeto.net/software/sinsy/about/


Singing voice synthesis system based on HMM algorithm


Sinsy generate audio files out of lyrics annotated MusicXML scores. A web service demo is available online, featuring a more recent version than the one found in the official public repository.


https://www.sinsy.jp/
https://github.com/mathigatti/midi2voice


Singing Synthesis from MIDI file
This script relies on the sinsy.jp website from the Nagoya Institute of Technology which implements a HMM-based Singing Voice Synthesis System.


DeepVocal: https://alternativeto.net/software/deepvocal/about/


DeepVocal is a free to use singing synthesizer application software developed by Boxstar


https://www.deep-vocal.com/

Doesn't look like there have been any real updates since 2019


https://vocalsynth.fandom.com/wiki/DeepVocal


DeepVocal is a singing synthesis program created by Boxstar. It is the successor to Sharpkey.


RenoidPlayer: https://alternativeto.net/software/renoidplayer/about/


RenoidPlayer is a web-based, free vocal synthesizer that runs on Renoise XRNI and SoundFont SF2.


https://www.g200kg.com/renoid/
https://vocadb.net/T/7621/renoid


Released on September 12th, 2012, Renoid is a free online-based singing synthesis engine developed by Sato-san. It can either be played within Renoise, in a different DAW via SoundFont, or used in-browser via Renoid Player. Nine voicebanks exist, four of which (Honoka Mei, Asane Bow, Nagone Mako, and Kasane Teto) were originally from UTAU and five of which (Nina, Jutero, Hana, Qurio, and Robozawa-200kg) were made directly for the program.


https://vocadb.net/T/8108/renoise


Renoise is a Digital Audio Workstation (DAW) intended for music creation. Through the use of plugins, Renoise can also support vocal synthesis such as those using Renoid voicebanks.


https://www.renoise.com/


Renoise is a complete, multi-platform and expandable Digital Audio Workstation. It lets you record, compose, edit, process and render production-quality audio using a music-tracker based approach.


Chipspeech: https://alternativeto.net/software/chipspeech/about/


chipspeech is a vocal synthesizer that recreates vintage vocal synths from the 1980s, developed by Plogue.


chipspeech is a vintage-style speech synthesizer which recreates the sound of famous 80's voice synthesis chips. It features 12 different voices, each with its own characteristic timbre. It is specially tailored for musical needs – simply type in your lyrics, and then play on your MIDI keyboard. It’s a true synthesizer, the sound can be extensively modified for easy and expressive performances. chipspeech also features a circuit bending emulation, letting you not only recreate the insane and chaotic sound of a circuit bent TI speaking device, but also use it in a controlled, musical way.


eCantorix: https://alternativeto.net/software/ecantorix/about/


eCantorix is a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.


https://github.com/divVerent/ecantorix


Singing synthesis frontend for espeak


eCantorix is a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.


https://espeak.sourceforge.net/


eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.
eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.


https://github.com/espeak-ng/espeak-ng


eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.


The eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington.
eSpeak NG uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. It also supports Klatt formant synthesis, and the ability to use MBROLA as backend speech synthesizer.


https://www.reddit.com/r/opensource/comments/lrzq3s/is_there_a_foss_equivalent_to_vocaloidutau/

Sinsy

See notes/links/etc in above sections


OpenUtau

See notes/links/etc in above sections


https://www.quora.com/Is-there-a-cheap-maybe-open-source-alternative-to-vocaloid-voice-synthesizer

UTAU

See notes/links/etc in above sections


Alter/Ego

See notes/links/etc in above sections


CEVIO Creative Studio

See notes/links/etc in above sections


Synthesizer V

See notes/links/etc in above sections


Sharpkey: https://deepvocal.fandom.com/wiki/Sharpkey


Sharpkey was a singing synthesizer application software developed by Boxstar. It was replaced by the Sharpkey Galaxy software, which was then replaced by the now modern DeepVocal.


MUTA

https://vocadb.net/T/3247/muta


MUTA is a discontinued Chinese synthesizer developed by 厦门优他动漫科技有限公司. As a whole, the program had five vocals: Yan Xi, Hupo Xu Yan, Weiyang, Liliko, and Xin Ge Ping; the latter two were only available through the mobile app. Three more voicebanks had been announced (Xi Sheng, Li Xiang, and You Hui), but only Xi Sheng ever received demos. MUTA has since ended development.


https://muta.fandom.com/wiki/MUTA_Wiki


MUTA is a Chinese voice synthesizing program that was developed and released by AI-MUTA in 2015. The synthesizer allows anyone to create music from their computer in seconds!


NIAOniao: https://vocalsynth.fandom.com/wiki/NIAONiao


NIAONiao Virtual Singer (袅袅虚拟歌手, Niǎoniǎo Xūnǐ Gēshǒu) is a Chinese voice synthesizer program developed by dsound.
The default voicebank is named Yu Niaoniao (余袅袅), however, users can create their own voicebank and take advantage of its larger file feature. NIAONiao can import MIDI files, VSQX files (VOCALOID3 only), and UST files, export tracks as the "Niao" file format (*.nn), and can render vocal tracks directly as WAV, MP3, or MIDI files.


VOCALINA: https://vocalsynth.fandom.com/wiki/VOCALINA


VOCALINA (보카리나) is a "text to speech" singing synthesizer and DAW for personal music-related content. It was developed in October 2011 by TGENS Co., Ltd. On September 25, 2017, it was announced by the CEO (Kang Woo-Mo) that VOCALINA's Service would be terminated on October 1, 2017. Contrary to this, on the 29th of September in 2017, the CEO announced they would work to extend the service for one more year, as thanks to the community for their encouragement and support. Additionally, VOCALINA would be free to use until the service expired.


DeepVocal

See notes/links/etc in above sections


Nakloid

See notes/links/etc in above sections


Cadencii

See notes/links/etc in above sections


https://vocalsynth.fandom.com/wiki/Vocal_Synthesizer_Wiki

Lists a few synths on the front page:

AITalk: https://vocalsynth.fandom.com/wiki/AITalk


AITalk (エーアイトーク) is a speech synthesis program created by AI, Inc.. It has been used as the base for VOICEROID, galaco Talk, Otomachi Una Talk EX, Gynoid Talk, and A.I.VOICE.


AquesTone: https://vocalsynth.fandom.com/wiki/AquesTone


AquesTone is a VSTi plugin developed by Aquest, there are four voice options: Female F1, Auto F1, Male HK, and Auto HK. UTAU Utane Uta (aka. Defoko) uses Female voice 1 as a source for the UTAU default voicebank.


CANTOR: https://vocalsynth.fandom.com/wiki/CANTOR


CANTOR is a singing synthesis software created by VirSyn


https://www.virsyn.net/en/E_Products/E_CANTOR/e_cantor.html


CeVIO: https://vocalsynth.fandom.com/wiki/CeVIO


CeVIO is a speech and singing synthesis software created by Frontier Works, Inc


https://cevio.jp/
CeVIO Creative Studio: https://vocalsynth.fandom.com/wiki/CeVIO_Creative_Studio


CeVIO Creative Studio is a program created by Frontier Works, Inc.. It is capable of both speaking and singing.


CeVIO AI: https://vocalsynth.fandom.com/wiki/CeVIO_AI


CeVIO AI is the successor to CeVIO Creative Studio. It was announced in December 2018 and is capable of both speaking and singing and utilizes deep learning technology.


CeVIO Pro (aka VoiSona): https://vocalsynth.fandom.com/wiki/VoiSona


VoiSona, formerly CeVIO Pro (チェビオ Pro), is a digital audio workstation (DAW), VSTi-compatible, commercial vocal synthesizer product capable of both speech and singing synthesis, and is the next generation of the CeVIO voice synthesis technology developed by the CeVIO Project in collaboration with Techno-Speech, Inc.. The alpha (α) version of CeVIO Pro is slated to be released on February 24, 2022 for free download.


https://voisona.com/


Free AI Singing Software.
VoiSona AI voice synthesis software realistically reproduces human singing. Developed and refined over many years, its AI technology produces superior synthesized voice quality. Available on both Windows and macOS with VSTi/Audio Units (AU) support, VoiSona is now easier to use and supports the needs of professionals.
VoiSona’s first artist is Chis-A. She is a magnetic singer with an androgynous voice. Chis-A is included in VoiSona’s default voice library under a user-friendly license. You will find her valuable to your creative endeavors.


CeVIO Creative Studio- and CeVIO AI-exclusive voice libraries cannot be used with VoiSona at present. Cross-platform support for each voice library is being considered.


VoiSona is available on Windows and macOS.
You can add VoiSona to your DAW as a VSTi plug-in or an Audio Unit (AU) plug-in.
Use VoiSona in your DAW for all processes from inputting score data to adjusting the singing voice to mixing.
VoiSona is also available as a standalone application for use without a DAW.


https://cevio.fandom.com/wiki/VoiSona


VoiSona, formerly known as CeVIO Pro (チェビオ Pro (仮), is an audio workstation (DAW), VSTi-compatible, commercial vocal synthesizer software that reproduces realistic singing voices with AI technology, and is the sister brand of the CeVIO voice synthesis technology developed by Techno-Speech, Inc. in collaboration with CeVIO Project. The alpha (α) version known as CeVIO Pro was released on February 24, 2022 for free download.[1] The beta (β) version was released on June 2, 2022 under the new name "VoiSona", with the official production version released on September 1, 2022.


DeepVocal

See notes/links/etc in above sections


ETHERA

TODO


NEUTRINO: https://vocalsynth.fandom.com/wiki/NEUTRINO


NEUTRINO is a Japanese neural voice synthesizer program developed by SHACHI.
It is compatible Windows, MacOS, and Linux. Web browser compatibility is based in Google Drive.


The user uploads data in the MusicXML format, which the NEUTRINO program reads to output a WAV file of the generated voice. Gender factor, vibrato intensity, and pitch shift can be adjusted prior to output.


https://studio-neutrino.com/


NEUTRINO Diffusion: AI Singing Voice Generator


https://studio-neutrino.com/#library


NEUTRINO SINGER LIBRARY
A unique singing voice library
As of July 2023, we have published the singing voice libraries of 13 people.


https://studio-neutrino.com/blog/

https://studio-neutrino.com/1591/


NEUTRINO Diffusion – Muon v2.x update
One year has passed since the official release of NEUTRINO, and it has evolved into the second generation (Muon).
We have carried out a complete renewal of the algorithm and model.
The second generation (Muon) uses the latest generation AI model, the Diffusion model, which makes it possible to generate voices with a higher sense of realism and rich singing expression. It has also been improved in terms of functionality, with the voice changing every time it makes an inference, and the quality and processing speed can be changed.
I hope this will be of help to you in your production.


https://studio-neutrino.com/535/


NEUTRINO Diffusion Download


All versions, including past versions, can be downloaded from this page. When using, after downloading the NEUTRINO main unit, please download the singing voice library separately and decompress and copy it.


Piapro Studio: https://vocalsynth.fandom.com/wiki/Piapro_Studio


Piapro Studio (ピアプロスタジオ) is a VSTi produced and developed by Crypton Future Media, Inc.. It was meant for use in a DAW (digital audio workstation), but has since received standalone releases. Piapro Studio was first packaged with KAITO V3 and has been released with most every Crypton release since.
Piapro Studio originally used VOCALOID as its core synthesizing engine, but later went onto become its own independent software.


https://piaprostudio.com/


Synthesizer V

See notes/links/etc in above sections


UTAU

See notes/links/etc in above sections


VOCALOID

See notes/links/etc in above sections


https://vocalsynth.fandom.com/wiki/Category:Software

Has 345 entries at time of writing


https://vocalsynth.fandom.com/wiki/Category:Open_source

VOICEVOX: https://vocalsynth.fandom.com/wiki/VOICEVOX


VOICEVOX is an open source, deep learning, reading/text-to-speech synthesizer software developed by Hiho.


OpenUtau

See notes/links/etc in above sections


TALQu: https://vocalsynth.fandom.com/wiki/TALQu


TALQu (とーく) is an open source, deep learning, reading/text-to-speech synthesizer software developed by Haruqa Software (Haruqaのソフトウェアとか).


ALYS: https://vocalsynth.fandom.com/wiki/ALYS


ALYS is vocal for Alter/Ego. She is the second vocal produced for the software, the first French vocal, and was developed by VoxWave. In addition to her French vocal she also has a Japanese vocal. ALYS' voice is provided by the French Utaite, Poucet, and is illustrated by Saphirya.[1]
In December 2021, ALYS became a open-source project and both her Alter/Ego aswell as her prototype UTAU voicebanks were released.


https://github.com/mathigatti/RealTimeSingingSynthesizer


Live Coding Singing Synthesizer
Real Time Singing Synthesizer project made from sinsy-NG. The idea was to generate vocal audio samples on real time easily for live coding performances.


https://github.com/Aozhi/Sinsy-NG


The HMM-Based English&Japanese&Chinese Singing Synthesis with espeak-ng


https://github.com/GloomyGrave/Sinsy-NG


(discontinued) 🎵The Formant-Based All Language Singing Voice Syntheis System: Sinsy-NG


https://github.com/topics/vocaloid
https://github.com/facebookresearch/audiocraft


Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.


https://arxiv.org/abs/2301.11325


MusicLM: Generating Music From Text


See Also

My Other Related Deepdive Gist's and Projects


Music APIs and DBs (0xdevalias' gist)
AI Voice Cloning / Transfer (eg. RVCv2) (0xdevalias' gist)
Audio Pitch Correction (eg. autotune, melodyne, etc) (0xdevalias' gist)
Automated Audio Transcription (AAT) / Automated Music Transcription (AMT) (aka: converting audio to midi) (0xdevalias' gist)
Generating Synth Patches with AI (0xdevalias' gist)
Compare/Diff Audio Files (0xdevalias' gist)
Working Around FLStudio Trial Limitations (0xdevalias' gist)