Skip to content

Instantly share code, notes, and snippets.

@0xdevalias
Last active April 22, 2024 17:52
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save 0xdevalias/5a06349b376d01b2a76ad27a86b08c1b to your computer and use it in GitHub Desktop.
Save 0xdevalias/5a06349b376d01b2a76ad27a86b08c1b to your computer and use it in GitHub Desktop.
Some notes on generating software synthesizer patches with AI

Generating Synth Patches with AI

Table of Contents

People Worth Following / Watching

In no particular order:

Generating Synth Patches with AI

  • See also:
  • https://micromusic.tech/
  • https://github.com/gudgud96/syntheon
    • Syntheon

    • Syntheon provides parameter inference for music synthesizers using deep learning models. Given an audio sample, Syntheon infers the best parameter preset for a given synthesizer that can recreate the audio sample.

    • Found via: https://forum.vital.audio/t/has-anyone-tried-syntheon-for-vital-synplant-ai-for-vital/13617
    • https://www.youtube.com/watch?v=nZ560W6bA3o
      • Parameter Inference of Music Synthesizers with Deep Learning - Hao Hao Tan - ADC22

        Synthesizers are crucial for designing sounds in today's music. However, to create the desired sound texture by tuning the right synthesizer parameters, one requires years of training and in-depth domain experience on sound design. Music producers might also search through preset banks, but it takes extensive time and effort to find the best preset that gives the desired texture.

        Imagine a program that you can drop your desired audio sample, and it automatically generates the synthesizer preset that could recreate the sound. This task is commonly known as "parameter inference" of music synthesizers, which could be a useful tool for sound design. In this talk, we will discuss how deep learning techniques can be used towards solving this task. We will cover recent works that use deep learning to perform parameter inference on a variety of synthesizers (FM, wavetable, etc.), as well as the challenges that were faced in solving this task.

      • https://docs.google.com/presentation/d/1PA4fom6QvCW_YG8L0MMVumrAluljcymndNlaK2HW5t0/
        • ADC 2022 Parameter Inference

        • TODO: There seem to be lots of good papers/etc referenced in these slides.. worth copying them out here for easier reference
      • https://gudgud96.github.io/publications/
        • Publications / Talks

  • https://github.com/spiegelib/spiegelib
    • SpiegeLib Synthesizer Programming with Intelligent Exploration, Generation, and Evaluation Library.

    • An object oriented Python library for research and development related to Automatic Synthesizer Programming. SpiegeLib contains a set of classes and base classes for developing and evaluating algorithms for generating parameters and patch settings for synthesizers.

    • https://spiegelib.github.io/spiegelib/
  • https://github.com/SlavaCat118/Vinetics
  • https://jakespracher.medium.com/generating-musical-synthesizer-patches-with-machine-learning-c52f66dfe751
    • Generating Musical Synthesizer Patches with Machine Learning (2021)

    • https://github.com/jakespracher/ml-synth-preset-generator/
      • Generating Musical Synthesizer Patches with Machine Learning This repo accompanies this blog post. The premise is to generate high-quality presets for the Ableton Analog synthesizer in a particular style automatically using generative machine learning models.

    • I tried generating presets using two different models: a variational autoencoder (VAE) and a generative adversarial network (GAN). The GAN had better performance, as it is a more sophisticated model.

    • I tried various architectures, one of the best performing was the Wasserstein GAN. The one I implemented was based on this tutorial.

    • More powerful synths will make training the GAN more challenging as they have many more configuration parameters than Ableton Analog. Additionally, there is the aforementioned issue of software interoperability: we can’t easily read and write presets for the most popular synths. What would it look like to get this working with a synth like Serum?

      VST fxp presets follow a predefined structure based on a specification. However, Serum presets are “Opaque Chunk” format meaning the data we care about is an opaque sequence of ones and zeros.

      Fortunately, it is still possible to make some sense of them. I was able to figure out that the chunk data is compressed by Zlib. We can decompress, make arbitrary single changes, and compare the result to the initial patch to reverse engineer the format.

      It would also theoretically be possible to build a VST host that loads the synthesizers, manipulates parameters, and writes presets using the VST interface but writing a custom VST host seemed like a lot of work so I figured it would be easier to start with the XML.

    • The networks that I’ve built fully rely on the preset configuration parameters as the training data the network learns from. An additional possibility that would be much harder to set up would be to also use the waveform generated by the sound as input to the model. I suspect this could significantly improve performance because the waveform is what a human would use to determine the aesthetic desirability of a sound. However, generating these waveforms from entire preset libraries in a suitable format would require a lot of scripting work.

  • https://www.reddit.com/r/synthesizers/comments/hn7pg9/i_trained_an_ai_to_generate_synth1_presets/
    • I trained an AI to generate Synth1 Presets (2020)

    • This site allows you to generate preset banks for the famous free VST plugin Synth1 using an AI I built and trained myself. I hope you find this interesting, and if you make any tracks using it I'd love to know!

      • https://www.thispatchdoesnotexist.com/
        • Daichi Laboratory's Synth1 is a virtual synthesizer based off of the Nord Lead 2 and is the most downloaded synth plug-in of all time. Click the floating synth above (or here), and an AI will generate a zipped bank of 128 presets for Synth1.

        • This project was heavily inspired by Nintorac's This DX7 Cartridge Does Not Exist. Make sure to check that out also!

          • https://www.thisdx7cartdoesnotexist.com/
            • The Yamaha DX7 is a classic synthesizer often cited as the sound of the 80's, this site uses a specially trained AI to create completely novel preset cartridges.

            • If you are interested in the code it can be found here though it is mostly undocumented.

        • I used a Generative Adversarial Network (GAN), which is essentially making two neural networks fight against each other for our own benefit. The AI is made up of two parts, a generator and a discriminator. The discriminator learns to detect whether a given preset is fake or not, while the generator learns how to fool the discriminator. Over time, the discriminator gets better at detecting fakes, while the generator gets better at generating fakes. At the end, the generator ends up producing pretty convincing fakes, which are then sent to you! The names are generated from a Recurrent Neural Network (RNN) based off of the synth parameters. There's a lot of duplicates however, further training and model finagling is needed. If you want a more technical explanation, check out this additional blog post I wrote.

AI Synths / Plugins / etc

  • https://soniccharge.com/synplant
    • Synplant takes a genetic twist on sound design by moving beyond traditional knob-twisting and dial-adjusting, emphasizing exploration and discovery. Here, your ears guide you through a forest of organic textures and evolving timbres.

    • Genopatch crafts synth patches from audio recordings, using AI to find optimal synth settings based on your source sample. As the strands in the user interface sprout and grow, they generate patches that increasingly match the chosen audio.

  • https://guk.ai/sistema-ai-synthesizer
    • Sistema is the first AI-powered software instrument that helps you easily create pro-quality sounds for your music.

    • Sistema uses AI models to provide a endless new sounds and textures, helping you to overcome creative blocks. Unique preset names add a fun twist to each session.

    • Sistema's advanced technology produces high-quality sound without the clutter of traditional synthesizers. Control various aspects of your sounds with the built-in macro collection featuring multiple effects.

    • Create or tweak sounds in any genre from Hip-Hop, EDM, Rock & Pop to House, Indie & Metal.

    • US$149 or US$30/mo for 6 months

Synths

A short list of some software synths that might be useful to explore generating patches for:

  • Vital: https://vital.audio/
    • Spectral warping wavetable synth

  • Serum: https://xferrecords.com/products/serum/
  • Native Instruments Massive: https://www.native-instruments.com/en/products/komplete/synths/massive/
    • Virtual-analog architecture for colossal sound Equally flexible in the studio or on stage Comprehensive library with 1,300 huge presets

  • Sylenth1: https://www.lennardigital.com/sylenth1/
    • Sylenth1 is a virtual analog VSTi synthesizer that takes the definitions of quality and performance to a higher level. Until now only very few software synthesizers have been able to stand up to the sound quality standards of hardware synths. Sylenth1 is one that does. Sylenth1 is not just another synth. It was built from a producer's point of view. It was built to produce superior quality sound and music. It was built to perform. A lot of research has been invested in order to achieve unheard warmth and clarity. The graphical interface ensures the highest level of usability so you can fully unleash your creativity.

  • Synth1: https://daichilab.sakura.ne.jp/softsynth/index.html
    • This is a software synthesizer intended for use with DTM software. Compatible with VSTi plug-in/AU plug-in format. This software is freeware.

    • Functionally, it is modeled after that red synth Clavia NORD LEAD2

  • etc

Synth Patches

  • https://github.com/instatetragrammaton/Patches
    • Patches and theory for your software synths

    • This repository contains patches for various software synthesizers. Often, these are remakes - clean-room reverse engineered - of the sounds you can hear in popular music. Others have a more academical approach - to teach or explain certain concepts. I am not a fan of "studio secrets". I have been fortunate enough to learn from many people at no cost; it is only fitting that I return the favor for others, keeping the threshold as low as possible for everyone.

  • https://github.com/Miserlou/SynthRecipies
    • Random Serum Patches

  • https://vst-preset-generator.org/
    • The VST Preset Generator is a software to create random preset for VST plugins.

    • The VST Preset Generator writes preset files (fxp for program patch or fxb for bank patch) with randomized values. This is a tool for lazy or curious sound designers, who want to experiment random theory with their VST plugins.

    • Source: https://svn.tuxfamily.org/viewvc.cgi/vpg_vst-preset-gen/

Learning Manual Synth Patch Design

  • https://www.syntorial.com/
    • Syntorial is video game-like training software, that will teach you how to program synth patches by ear. With almost 200 lessons, combining video demonstrations with interactive challenges, you’ll get hands on experience programming patches on a built-in soft synth, and learn everything you need to know to start making your own sounds with ease.

  • https://unison.audio/reverse-engineer-presets-in-serum/
    • A Complete Guide To Reverse-Engineering Any Preset In Serum (2018)

    • Note: This is more about manually reverse engineering and re-creating the sound, not reverse engineering the Serum patch file format itself

Interacting with VSTs from code

Reverse engineering Serum patch format

  • https://www.reddit.com/r/edmproduction/comments/69hxa7/reverse_engineering_serums_fxp_files/

    • Reverse engineering Serum's .fxp files? (2017)

    • I'm trying to build a program to generate new presets for Serum. Serum stores them as a .fxp file, which seems to be a header followed by opaque data. Is there documentation on how to extract parameter values from these files somewhere, or am I out of luck?

    • Update: I reached out to Steve through Xfer's forum and he was prompt and helpful. Unfortunately, the .fxp file format is completely dependent on the source code, and he can't release a spec for it without making the code open source. Which probably isn't happening any time soon.

  • Random notes from decompilation:

    • 002f7730  void** juce::SystemStats::getJUCEVersion(void** arg1)
      ..snip..
      002f7759      __builtin_strcpy(dest: &rax_1[4], src: "JUCE v6.0.4")
      ..snip..
      
      • https://github.com/juce-framework/JUCE/releases/tag/6.0.4
      • ChatGPT: To create a package that contains all the JUCE headers and sources for your project, similar to the JuceLibraryCode folder found in JUCE's examples and extras, you would typically use the Projucer, JUCE's project management tool.

      • It seems that Projucer also lets us pick the C++ Language Standard that will be used for the project, and defaults to C++14 (options are C++11, C++14, C++17, use latest)
      • Choosing C++11 raised a warning that "Module(s) have a higher C++ standard requirements than project."
      • Once that is done, we can add a path like this to the 'Compiler Flags' section of Binary Ninja: -I/Users/devalias/Desktop/path/to/juce-6.0.4-osx/modules
      • And attempt to import the JuceHeader.h file
      • But in doing so we get an error lile this, which apparently could be resolved by pointing Binary Ninja at the C++ standard libraries (even though I believe it already should be.. so maybe we don't have the 'right' c++ standard libraries installed by default?):
      • Even if we remove the juce_analytics module within Projucer and attempt to re-import into Binary Ninja, we end up with an error like this:
        • error: /Users/devalias/Desktop/path/to/juce-6.0.4-osx/modules/juce_core/system/juce_TargetPlatform.h:56:3 "No global header file was included!"
          error: /Users/devalias/Desktop/path/to/juce-6.0.4-osx/modules/juce_core/system/juce_StandardHeader.h:46:10 'algorithm' file not found
          2 errors generated.
          
          • To fix the juce_TargetPlatform.h:56:3 "No global header file was included!" part of the error, within Projucer, select the 'Project Settings', scroll down to 'Use Global AppConfig Header', and change it from 'Disabled' to 'Enabled'. When this is disabled, these settings are injected via the build system, which means we would need to specify those settings to Binary Ninja in a more manual way.. enabling it seems to simplify that step.
          • To fix the C++ standard library not being found error(s), they seem to go away when we specify a full path to the c++ includes, rather than just pointing to the higher level path
            • -isystem/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/
              -isystem/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
              -isystem/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/15.0.0/include
              -isystem/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
              -I/Users/devalias/Desktop/path/to/juce-6.0.4-osx/modules
              
          • Though after doing that, I end up with a different error (unsure if related to the above being the wrong approach to take):
            • error: /Users/devalias/Desktop/path/to/juce-6.0.4-osx/modules/juce_core/streams/juce_InputStream.h:261:5 CType::TemplateSpecialization not implemented
              1 error generated.
              
            • As best I can currently figure out.. this might be a limitation of Binary Ninja's header importer.. though I'm not 100% sure about that
    • Looking at serum presets, we can see various chunks of data at the start of the file like: CcnK, BFPCh, XfsX, etc (along with some non-printable hex characters)
      • Searching for CcnK showed a few variations (CcnK, CcnKH, CcnKL), with the most relevant looking code snippets being the following:
        • char const chunkheader[0x5] = "CcnK", 0
        • void* data_7d46c8 = chunkheader {"CcnK"}
        • int64_t SerumDSP::SavePreset(int64_t* arg1)
      • Searching for BFPCh didn't turn up anything, but FPCh did, with the most relevant looking code snippets being the following:
        • char const data_67cfa8[0x5] = "FPCh", 0
        • int64_t SerumDSP::SavePreset(int64_t* arg1)
      • Searching the decompiled binary for XfsX, showed the most relevant looking code snippets to be the following:
        • char const data_67cfb0[0x5] = "XfsX", 0
        • int64_t SerumDSP::SavePreset(int64_t* arg1)
    • Searching for symbols mentioning 'save' there are a whole bunch of results, but after filtering out all of those prefixed with _mus, juce, _sqlite, etc; the ones that looked most potentially interesting to me were:
      • Symbiosis::saveProperties(void const*, FSRef const*)
      • VSTGUI::CDrawContext::saveGlobalState() / VSTGUI::CDrawContext::restoreGlobalState()
      • VSTGUI::CGDrawContext::saveGlobalState() / VSTGUI::CGDrawContext::restoreGlobalState()
      • int64_t SerumGUI::SavingPresetStuff(void* arg1)
        • Basically only contained the following:
          • SerumGUI::DoDialog(arg1, "Serum Demo version", "Sorry, the demo does not save presets!", 0) __tailcall
      • int64_t SerumGUI::Save_Buffer_To_File(void* arg1)
        • Basically only contained the following:
          • SerumGUI::DoDialog(arg1, "Serum Demo version", "Sorry, the demo does not save tables!", 0)
      • int64_t SerumDSP::saveDataBufferToWav(void* arg1)
        • return SerumDSP::makeDialog(arg1, "Save Aborted!!", "Sorry, the demo version does not allow you to export wav files.", &data_6b5d74, 0, (zx.o(0)).d) __tailcall
      • int64_t SerumDSP::SaveFXRackSettings(int64_t* arg1)
        • Seems to use the magic bytes FXRK as a header when writing the file
        • Called from the SerumGUI::notify function
      • int64_t SerumDSP::SaveFXSingleSettings(void* arg1, char arg2)
        • Seems to use the magic bytes FXRS as a header when writing the file
        • Called from the SerumGUI::notify function
      • int64_t SerumDSP::SaveMIDIMap(void* arg1)
      • int64_t SerumDSP::SaveShape_Old(void* arg1, char arg2)
      • int64_t SerumDSP::SaveShape_New(void* arg1, int32_t arg2)
        • Seems to use the magic bytes SERUMSHP
      • int64_t SerumDSP::SavePreset(int64_t* arg1)
        • __builtin_strncpy(dest: rax_14, src: "CcnK", n: 8)
          __builtin_memcpy(dest: rax_14 + 8, src: "\x46\x50\x43\x68\x00\x00\x00\x01\x58\x66\x73\x58\x00\x00\x00\x01\x00\x00\x00\x01", n: 0x14)
          
          • This seems to use the magic bytes CcnK, and then within the hex of the __builtin_memcpy, it also includes the magic bytes FPCh and XfsX
    • Searching for symbols mentioning 'load' there are a whole bunch of results, but after filtering out all of those prefixed with juce, _sqlite, etc; the ones that looked most potentially interesting to me were:
      • void* Symbiosis::VSTPlugIn::loadFXPOrFXB(void* arg1, int64_t arg2, int32_t* arg3)
        • Called from:
          • int64_t Symbiosis::SymbiosisComponent::convertVSTPreset(void* arg1, int64_t arg2, char arg3)
          • void* Symbiosis::SymbiosisComponent::auSetProperty(void* arg1, int32_t arg2, int32_t arg3, int32_t arg4, int32_t* arg5, int32_t arg6)
      • int64_t Symbiosis::SymbiosisComponent::loadConfiguration(void* arg1)
      • int64_t Symbiosis::loadFromFile(int64_t arg1, int64_t* arg2)
        • Called from:
          • int64_t Symbiosis::SymbiosisComponent::convertVSTPreset(void* arg1, int64_t arg2, char arg3)
          • int64_t Symbiosis::SymbiosisComponent::loadFactoryPresets(void* arg1, int64_t arg2)
          • int64_t Symbiosis::SymbiosisComponent::readParameterMapping(void* arg1, int64_t arg2)
      • int64_t Symbiosis::SymbiosisComponent::loadFactoryPresets(void* arg1, int64_t arg2)
        • Called from:
          • int64_t Symbiosis::SymbiosisComponent::loadOrCreateFactoryPresets(void* arg1)
      • int64_t Symbiosis::SymbiosisComponent::loadOrCreateFactoryPresets(void* arg1)
        • Mentions SYFactoryPresets.txt
      • int64_t AudioEffectX::beginLoadBank() __pure
        • This seems to be an empty function.. patched out in the demo version maybe?
      • int64_t AudioEffectX::beginLoadProgram() __pure
        • This seems to be an empty function.. patched out in the demo version maybe?
      • uint64_t PresetManager::LoadDBAtPath(char** arg1, char* arg2, int64_t arg3)
        • Called from:
          • int64_t SerumGUI::PlotDBListInPresetBrowser(void* arg1, int32_t arg2, int32_t arg3, int64_t arg4)
          • int64_t SerumGUI::checkDefaultPath(int64_t* arg1, char arg2)
            • Seems to check a paths that include:
              • /System/user.dat
              • System/presetdb.dat
              • /Library/Audio/Presets/Xfer Records/Serum Presets/
            • And call functions like:
              • SerumGUI::PopulateWaveTableMenus(arg1)
              • SerumGUI::RefreshFilesToVector(arg1, &var_a68, 2)
              • SerumGUI::PopulateTableMenuOfType(arg1, 2)
              • SerumGUI::updateLFOShapeMenus(arg1)
              • SerumGUI::updateFXUnitsMenus(arg1)
              • SerumGUI::Formulas_Init(arg1)
              • PresetManager::LoadDBAtPath(arg1[0xc], &var_668, &var_a68)
              • SerumGUI::sqlte_CheckDBHeader(arg1, rsi_27, rdx_12, 1)
              • SerumGUI::updatePresetMenu(arg1, 1)
              • SerumGUI::updateTableMenuCheckmarks(arg1, 0)
      • uint64_t SerumGUI::LoadExec_This_Sound_From_File(void* arg1, char* arg2, int32_t arg3, int32_t arg4, int32_t arg5)
        • Called from:
          • int64_t SerumGUI::valueChanged(int64_t* arg1, int64_t* arg2, double arg3[0x2] @ zmm0)
          • uint64_t SerumGUI::notify(int64_t* arg1, int64_t arg2, int64_t arg3, uint32_t arg4[0x4] @ zmm0)
          • int64_t SerumGUI::doEvenIfClosedIdle(int64_t* arg1)
        • Calls functions like:
          • SerumGUI::ParsePathFromMediaBayXML(rdi_1, arg2)
          • SerumGUI::CheckIfValidSoundFile(_strcat(&var_438, i_1 + 0x15), &var_438)
      • void SerumGUI::LoadWT(void* arg1, int32_t arg2, int16_t arg3)
      • void SerumGUI::setFlagForPresetLoadFromDSP(void* arg1, int32_t arg2)
      • int64_t SerumDSP::DoXPlatLoadStuff(int64_t* arg1, char* arg2, int32_t arg3, int32_t arg4, int64_t arg5, char arg6)
        • Called from:
          • int64_t SerumDSP::setParameter() __pure
          • and seemingly potentially other places too
        • Calls functions like:
          • SerumDSP::ImportWT(r14_1, arg2, r15_1)
          • SerumDSP::ImportWTS(r14_1, arg2, r15_1)
          • SerumDSP::ImportAWV(r14_1, arg2, r15_1, 0)
          • SerumDSP::ImportAWV(r14_1, arg2, r15_1, 1)
          • SerumDSP::ImportKRI(r14_1, arg2, r15_1)
          • SerumDSP::ImportMF2(r14_1, arg2, r15_1)
          • SerumDSP::Import256(r14_1, arg2, r15_1)
          • SerumDSP::ImportWDF(r14_1, arg2, r15_1, rbx.d)
          • SerumDSP::ImportWAVETABLE(r14_1, arg2, r15_1)
          • SerumDSP::ImportPNG2WT(r14_1, arg2, r15_1)
          • SerumDSP::LoadPreset(arg1)
          • SerumGUI::setWaveCustomEdited(rdi_37, arg5.d, 1)
      • int64_t SerumDSP::LoadTuningFile(void* arg1, int32_t arg2)
        • Called from:
          • uint64_t SerumGUI::notify(int64_t* arg1, int64_t arg2, int64_t arg3, uint32_t arg4[0x4] @ zmm0)
          • and seemingly potentially other places too
        • Calls functions like:
          • CTuningMap::ReadFromFile(arg1 + 0x122940, *(arg1 + 0x39520))
      • int64_t SerumDSP::DoNoiseLoad(void* arg1, char* arg2, char arg3, int32_t arg4)
        • Called from:
          • int64_t SerumDSP::setParameter() __pure
          • int64_t SerumDSP::initNoiseTable(void* arg1, char arg2)
            • Reads from paths like:
              • /Library/Audio/Presets/Xfer Records/Serum Presets/
              • Organics/AC hum1.wav
              • Noises
        • Calls functions like:
          • SerumDSP::calculateNoiseTable(arg1, r13_9, i_37.d, r14_11)
      • uint64_t SerumDSP::LoadPreset(int64_t* arg1)
        • Called from:
          • int64_t SerumGUI::doEvenIfClosedIdle(int64_t* arg1)
          • int64_t SerumDSP::DoXPlatLoadStuff(int64_t* arg1, char* arg2, int32_t arg3, int32_t arg4, int64_t arg5, char arg6)
          • int64_t SerumDSP::CheckForDefaultFXPAndLoad(int64_t* arg1)
          • int64_t SerumDSP::LoadRandomizedPreset(int64_t* arg1, int32_t arg2, int64_t* arg3)
      • int64_t SerumDSP::LoadShape_New(int64_t* arg1, char arg2, char arg3, uint128_t arg4 @ zmm0, uint64_t arg5[0x2] @ zmm1)
        • if (_strncmp(rax_3, "SERUMSHP", 8) != 0)
          _free(rax_3)
          return SerumDSP::LoadShape_Old(arg1, zx.d(rbx_1), arg3) __tailcall
          
        • Called from:
          • int64_t SerumDSP::setParameter() __pure
      • int64_t SerumDSP::LoadShape_Old(int64_t* arg1, int32_t arg2, char arg3)
        • Called from:
          • int64_t SerumDSP::LoadShape_New(int64_t* arg1, char arg2, char arg3, uint128_t arg4 @ zmm0, uint64_t arg5[0x2] @ zmm1)
      • int64_t SerumDSP::CheckForDefaultFXPAndLoad(int64_t* arg1)
        • Presets/User/default.fxp
        • Called from:
          • int64_t SerumDSP::setParameter() __pure
          • int64_t SerumDSP::SerumDSP(int64_t* arg1, int64_t arg2)
      • int64_t SerumDSP::CheckForDefaultMIDIMapAndLoad(void* arg1)
        • System/MIDI CC Maps/default.mmp
        • Called from:
          • int64_t SerumDSP::SerumDSP(int64_t* arg1, int64_t arg2)
      • int64_t SerumDSP::LoadMIDIMap(void* arg1)
        • Called from:
          • uint64_t SerumGUI::notify(int64_t* arg1, int64_t arg2, int64_t arg3, uint32_t arg4[0x4] @ zmm0)
          • int64_t SerumDSP::CheckForDefaultMIDIMapAndLoad(void* arg1)
      • int64_t SerumDSP::CheckForProgramChangeMapAndLoad(void* arg1)
        • System/ProgramChanges.txt
        • Called from:
          • int64_t SerumDSP::SerumDSP(int64_t* arg1, int64_t arg2)
      • int64_t SerumDSP::LoadRandomizedPreset(int64_t* arg1, int32_t arg2, int64_t* arg3)
        • Called from:
          • int64_t SerumGUI::doEvenIfClosedIdle(int64_t* arg1)
      • int64_t SerumDSP::LoadFXSingleSettings(int64_t* arg1, int32_t arg2)
        • Called from:
          • int64_t SerumGUI::valueChanged(int64_t* arg1, int64_t* arg2, double arg3[0x2] @ zmm0)
          • uint64_t SerumGUI::notify(int64_t* arg1, int64_t arg2, int64_t arg3, uint32_t arg4[0x4] @ zmm0)
      • int64_t SerumDSP::LoadFXRackSettings(int64_t* arg1)
        • Called from:
          • uint64_t SerumGUI::notify(int64_t* arg1, int64_t arg2, int64_t arg3, uint32_t arg4[0x4] @ zmm0)
    • Some other random potentially interesting locations
      • int64_t** queryUpdateServer(int64_t** arg1, int64_t* arg2, int32_t* arg3, char* arg4)
        • https://xferrecords.com/api/update_check/?version=&os=&plugin_type=&host=&hostver=&tstamp=
          
          X-API-Key: organist-diction-model-molehill
          Content-Type: application/json
          Accept: application/json; version=1
          
      • uint64_t SerumGUI::sNhBe(int64_t* arg1, int32_t arg2)
      • void SerumGUI::idle(int64_t* arg1)
        • int64_t rax
          int64_t var_18 = rax
          if (arg1[4] != 0 && arg1[0x205].b != 0)
              if (arg1[0x63b].b == 0)
                  SerumGUI::doIdle(arg1)
              if ((arg1[0x63b].b == 0 && arg1[0x63e].b == 0) || (arg1[0x63b].b != 0 && arg1[0x63e].b == 0))
                  int64_t rcx_1
                  char const* const rdx_1
                  char const* const rsi_1
                  int64_t* rdi_1
                  if (*(arg1[1] + juce::LookAndFeel_V3::getTreeViewIndentSize) == 0)
                      rsi_1 = "\n Demoversion has timed out!"
                      rdx_1 = "Please remove and re-insert Serum"
                      rdi_1 = arg1
                      rcx_1 = 0x65
                  else
                      rsi_1 = "Thank you for trying the Serum Demo!"
                      rdx_1 = "The demo does not save, and stops producing sound after 20 minutes per use."
                      rdi_1 = arg1
                      rcx_1 = 0x64
                  SerumGUI::DoDialog(rdi_1, rsi_1, rdx_1, rcx_1)
              arg1[0x63e].b = 1
          
      • uint64_t SerumGUI::notify(int64_t* arg1, int64_t arg2, int64_t arg3, uint32_t arg4[0x4] @ zmm0)
        • if (VSTGUI::CVSTGUITimer::kMsgTimer == arg3)
              r13 = 1
              if (arg1[0x63b].b != 0)
                  SerumGUI::doIdle(arg1)
          else
          
        • if (VSTGUI::CNewFileSelector::kSelectEndMessage == arg3)
        • in the else block of if (arg2 == 0)
        • somewhere nested after that it mentions strings like:
          • (*(rcx_60 + 0x2e0))(rax_106, "Enter number of samples per frame", "Click \'Guess\' to have Serum guess frame size from pitch.", rcx_60)
            __builtin_strncpy(dest: arg1[0x3b8] + 0x1ea, src: "2048", n: 5)
            
        • under label_4f663f
          • SerumDSP::LoadFXSingleSettings(arg1[1], zx.q(r14_4))
        • then in various other parts of the function
          • SerumGUI::checkDefaultPath(arg1, 0)
          • SerumDSP::LoadMIDIMap(arg1[1])
          • SerumDSP::LoadTuningFile(arg1[1], 0)
          • SerumDSP::LoadFXRackSettings(arg1[1])
          • SerumGUI::LoadExec_This_Sound_From_File(rdi_7, rsi_4, rdx_13, rcx_12, 0)
          • SerumGUI::Export_Buffer_To_256(arg1, rbx_3.d, 1, *(arg1[1] + 0x39520), arg4[0])
          • under label_4f6214 (and in another place close by)
            • SerumGUI::DoDialog(arg1, "Serum Demo version", "Sorry, the demo does not save tables!", 0)
          • under label_4f6350
            • SerumGUI::SavingTableStuff(arg1, r8_1.d - 9, 0, 0, 1, arg4[0])
          • if (rax_46 != 0)
                _fclose(rax_46)
                _strcpy(&var_338, *(arg1[1] + 0x39520))
                EditorFileNameOnly(&var_338, 0)
                __builtin_strcpy(dest: &var_338 + _strlen(&var_338), src: " already exists! Do you want to replace it?")
                r13 = 1
                SerumGUI::DoDialog(arg1, &var_338, "Click Cancel to Cancel save, or …", 1)
            else
                label_4f6380:
                SerumGUI::DoDialog(arg1, "Serum Demo version", "Sorry, the demo does not save presets!", 0)
                r13 = 1
            
      • int64_t SerumGUI::CheckIfFileExists(int64_t arg1, int64_t arg2)
      • int64_t SerumGUI::CheckIfFileExistsAndEnumerate(int64_t arg1, int32_t* arg2, char arg3)
    • Symbiosis::VSTPlugIn::loadFXPOrFXB
    • SerumDSP::SavePreset
    • SYParameters.txt
  • https://unison.audio/reverse-engineer-presets-in-serum/

    • A Complete Guide To Reverse-Engineering Any Preset In Serum

  • https://xferrecords.com/forums/general/file-types

    • Serum presets will show "XfsX" for the 16-20th chars of the file

    • Serum preset file format is complex/programmatic and isn't public. I hope to make it more open in the future / new format with conversion or import.

      There is public info on the Serum wavetable format. This is an example 'clm ' chunk from a Serum-created wavetable (.wav) File:

      <!>2048 01000000 wavetable (www.xferrecords.com) Serum currently assumes 2048 (samples per frame) at all times, so as of now 2048 should always be written there.

      Only the two first flags are currently used:

      • the first flag is the WT interpolation [0 = no interpolation, 1 = linear crossfades, 2,3,4 = spectral morph]
      • the second flag is "Serum Factory WT" which means Serum assumes this file comes with Serum and everyone already has it - thus it will not embed in to user presets to keep file sizes down. PLEASE DO NOT ENABLE THIS FLAG IF YOU ARE CREATING WAVETABLES - please leave it to zero, thank you very much. If you want a similar flag for yourself to identify tables as factory or otherwise for your product, drop me a line and I will reserve you a flag or a different value on that flag.
    • https://xferrecords.com/forums/general/file-types#post_80648
      • See my PoC code for enumerating the synth plugin settings + extracting patch details via Spotify's pedalboard in this comment
  • https://www.reddit.com/r/edmproduction/comments/69hxa7/reverse_engineering_serums_fxp_files/

    • I'm trying to build a program to generate new presets for Serum. Serum stores them as a .fxp file, which seems to be a header followed by opaque data. Is there documentation on how to extract parameter values from these files somewhere, or am I out of luck?

    • Update: I reached out to Steve through Xfer's forum and he was prompt and helpful. Unfortunately, the .fxp file format is completely dependent on the source code, and he can't release a spec for it without making the code open source. Which probably isn't happening any time soon.

    • You can reverse engineer by exporting fxp files for every parameter change to create a spec that would produce compatible presets. Then build your program using VST SDK's fxProgram struct.

      I would try first with a simple plugin with just couple values OR just ask the dev to open the code, some strictly forbid decompiling their software :V

    • Exactly, it seems like those are the only options unfortunately. I've heard some have success with doing changing a parameter value in the .fxp file, and successfully using that preset in Serum. I'm just annoyed that the process can't be automated.

  • https://forum.vital.audio/t/idea-convert-serum-presets-to-vital/2580

    • Idea: Convert Serum presets to Vital As Serum and Vital seem to share quite a lot of common ground, I was thinking - would it be possible to write a script that converts Serum presets and wavetables to Vital?

    • You can copy the serum wavetable and sample files into your vital folders. Works fine

    • I have tried to convert some, but vital lacks an effect device like the expander/unison tool in Serum. also, remapping and reverb lfos which Serum patches make use of alot, do not exist in Vital yet. vital sounds very warm somehow, whereas Serum is crisp by default. two very different souls of equal beauty

    • I had a similar thought, but after I tried to implement several Serum patches manually on Vital, I came to the conclusion that many items are not a simple conversion, due to several differences which could not be realized on one synth or the other. The other problem is the saved preset format. Vital presets are saved as text files with all the parameters listed out in human readable form. Serum presets are saved in a binary format, which are machine readable, and unless someone on the net has a map to how this is done, you would have to reverse engineer it yourself - which is a big project. But, I’ll tell you how I would start: save an Init patch, then begin by changing one knob at a time, save the preset and inspect the values in the preset file. You will begin to see what the pattern is, then you would have to write a program to convert the Serum preset into text. Once you had most of the parameters covered, you could begin to correlate the values of one synth against the other. Then, you could start devising a method to import that preset text into Excel or some spread sheet program. Some of the values do not have a one to one correlation between each other, so you would need to come up with formulas to convert Serum values to the similar Vital values. There are many parameters that translate easier than others. Osc and filter values are pretty easy to figure out, except for exclusive features, and others like the effects, as noted in another post, are just different and cannot be done. A few of the complex filters in Serum can be realized by cascading the 2 filters in Vital, but not all of them. Try converting a well documented Serum patch to Vital yourself, and you’ll likely begin to see the problems. My bottom line is I’m going to have to buy Serum if I want to make those sounds easily

    • https://forum.vital.audio/t/idea-convert-serum-presets-to-vital/2580/14?u=devalias
      • See my PoC code for enumerating the synth plugin settings + extracting patch details via Spotify's pedalboard in this comment
  • https://www.kvraudio.com/forum/viewtopic.php?t=420599&start=2355

    • As you know, Serum's preset manager, while being one of the best of many VSTs out there, it's still lacking in basic features such as delete. Save on spot without full database re-scan and etc etc.

      However over time I've been given so many presets for serum via signups to newsletters or those that come with sample packs etc that I've got more SHAIT presets than good ones, not to mention those annoying duplicates with just different names. So I wrote a little tool to get rid of all the shit.

      https://github.com/DarceyLloyd/serum-preset-collector

      The instructions are on the page on how to use it, just make backups before you start making all your changes of noises, wavetables, presets and the serumdatabase.dat file (sqlite database file).

      I would have wrote something to organise the noise and wavetable folders also but I can't be arsed to decode the fxp file format that serum uses to do that... Maybe in the future... Real shame Serum can't scan the noise and wavetable folders for each preset with missing noise & wavetable files and allow a bulk re-save on this.

Parsing preset files from code (.fxp/.fxb/.vstpreset)

Rendering a Vital Synth Patch to audio from CLI

  • https://www.reddit.com/r/VitalSynth/comments/16hrx39/has_anyone_used_vital_in_headless_mode/
    • https://www.reddit.com/r/VitalSynth/comments/16hrx39/comment/k86801j/
    • See my comment for more specific details, but the most relevant snippets are below.
    • ⇒ /Applications/Vital.app/Contents/MacOS/Vital --help
      Usage:
        Vital [OPTION...]
      
      Vital polyphonic wavetable synthesizer.
      
      Help Options:
        -h, --help                          Show help options
      
      Application Options:
        -v, --version                       Show version information and exit
        --headless                          Run without graphical interface.
        --tabletowav                        Converts a vitaltable to wav file.
        --tableimages                       Renders an image for the table.
        --render                            Render to an audio file.
        -m, --midi                          Note to play (with --render).
        -l, --length                        Not length to play (with --render).
        -b, --bpm                           BPM to play (with --render).
        --images                            Render oscilloscope images (with --render).
      
      ⇒ cd ~/Desktop
      
      ⇒ /Applications/Vital.app/Contents/MacOS/Vital --render "/Users/devalias/Music/Vital/Factory/Presets/Plucked String.vital" --midi 48 --length 3 --bpm 120 --images
      
    • Though unfortunately, when I tried to do that, it seems to get a segmentation fault and crash:

      • Thread 0 Crashed:: JUCE Message Thread Dispatch queue: com.apple.main-thread
        0   Vital                                  0x105b4a238 juce::AudioFormatWriter::writeFromFloatArrays(float const* const*, int, int) + 40
        1   Vital                                  0x105efb2ac SynthBase::renderAudioToFile(juce::File const&, float, float, std::__1::vector<int, std::__1::allocator<int>>, bool) + 3132
        2   Vital                                  0x105ce39be processCommandLine(juce::String const&) + 6334
        3   Vital                                  0x105e59ffa StandaloneSynth::initialise(juce::String const&) + 570
        4   Vital                                  0x105b8c7e2 juce::JUCEApplicationBase::initialiseApp() + 1042
        5   Vital                                  0x105cbed5b juce::JUCEApplication::initialiseApp() + 11
        6   Vital                                  0x105b4a578 main + 216
        7   dyld                                0x7ff806ac241f start + 1903
        
    • And from ChatGPT, analysing the full segfault crash report, gives the following explanation:

      • The crash report indicates that the application Vital crashed due to a segmentation fault (SIGSEGV), which is a type of crash where a program attempts to access a restricted area of memory.
        
        Here are the key points from the crash report:
        
        - **Process**: Vital (audio.vital.synth)
        - **Version**: 1.5.5
        - **Crashed Thread**: 0 JUCE Message Thread (main thread)
        - **Exception Type**: EXC_BAD_ACCESS (SIGSEGV)
        - **Exception Codes**: KERN_INVALID_ADDRESS at 0x0000000000000018
        
        The crash occurred in thread 0, which is the main thread of the application. The function call stack indicates that the crash happened during a call to `juce::AudioFormatWriter::writeFromFloatArrays`, which is a function from the JUCE framework used to write audio data. It appears that the application attempted to access memory at address `0x18`, which was not a valid address, leading to the segmentation fault.
        
        ..snip..
        
    • I raised this as a bug here: mtytel/vital#45

Unsorted

Musings

Not sure how much music production stuff you've done, but if you're open to sharing a high level /2c on whether you think this is a practical thing, would be curious what your thoughts are on the idea of applying ML sort of concepts to 'audio generation', but not in a 'direct' way, so much as making synthesizer patches/presets/etc?

Some thoughts in that space would be:

  • basic text prompt -> synth patch (likely vital at first because free, and ideally serum also, but I probably need to reverse engineer their patch binary format first; or extract that relevant details via another indirect method)
  • 'clone'/'convert' a serum patch -> vital (or vice versa): not as simple as direct 1:1 parameter mapping from their file formats as they implement things differently, so I would imagine it might end up being sort of like a adversarial type thing to get 'closer' to the 'right' sound
  • probably others, but that's what is 'top of mind' at the moment

On my ai synth gist, in the unsorted section there’s a few links to projects/blogs that have done synth patch generation stuff.

I was re-reading the one that did it for ableton analog the other day; it sounded like a decent starting point

But one of their future improvements things (and what I was also thinking) would be to somehow add some better feedback on the output sound back into the model; either based on the audio it produces, or the spectrogram of it or similar

I was also thinking there are some models that can label audio based on various moods/danceability/etc; that I figured maybe could be used to enrich the ‘metadata’ that goes into the training; but not sure if they would work on like simple synth sounds; as I suspect they need the full composition to figure those sorts of things

The other area I was thinking about was sort of in the space of GAN’s; and maybe being able to give it a sound you want to get a patch for; and having it be able to learn to adjust all the synth params to get as close as possible to that sound. Where the reward feedback would basically be how ‘close’ to the sound it could get

And then in a different area of AI useful; that’s there’s already at least 1 app for; and ableton 12 sounds like it added it; grouping/clustering existing samples and being able to search by vibe/qualities of them, find similar sounds, etc. As I always struggle to find the ‘right’ sounds that I want; without just going through everything and getting distracted along the way

Thing like splice’s create tool are also kind of a neat application:

Basically anything that helps me get from “I want to do a thing” to “oh, that’s got the creative juices going!”


I think what would be really neat if training a model on various presets, is if it could sort of 'bucket'/group the different synth presets so they are more tightly integrated, but potentially also train on multiple synths worth at the same time, so that it can maybe 'share the learnings' among them. I don't really know enough about how that works at a more deeper/technical level, but I'm pretty sure bucketing was one of the methods they used when training StableDiffusion on different image sizes/etc


From (private chat link): https://chat.openai.com/c/08e6d859-c270-460d-bb89-31c395dab661

Me:

I believe with some of the StableDiffusion training, they divided the training data into different 'buckets' based on the image size dimensions. I think they did this so that they could train on a larger set of data, while having the model 'learn better' aspects about the items in the 'bucket'. I think this was done so that the overall model could share knowledge about general concepts 'between' all of the buckets (rather than having individual models trained on the smaller datasets of each individual bucket), while also gaining some benefit from dividing the data into those buckets (eg. maybe it learned that data in those buckets was more correlated in a certain way that might not generalise across all buckets)

What do you think about my assumptions here? Are they grounded in reality? Are there specific technical terms in machine learning / data science that refer to these concepts?

ChatGPT:

Your understanding aligns quite well with certain concepts in machine learning and data science, particularly in the context of training models like StableDiffusion. Let's break down your assumptions and relate them to technical terms and practices:

  1. Dividing Training Data into 'Buckets' Based on Image Size Dimensions: This is a strategy often used in data preprocessing and is known as "data stratification" or "data bucketing." The idea is to create subgroups or 'buckets' within the data that share common characteristics (in this case, image size). This can help in managing memory usage and computational efficiency, especially when dealing with a diverse range of image sizes.

  2. Training on a Larger Set of Data While Learning Specific Aspects in 'Buckets': This approach leverages the concept of "transfer learning." By training on a large dataset, the model learns general features and patterns. Then, by focusing on specific buckets, it can fine-tune its understanding of particular characteristics associated with those groups. This is common in deep learning where a model trained on a broad dataset is later fine-tuned on a more specific dataset.

  3. Sharing Knowledge Between Buckets for General Concepts: This is essentially the essence of "feature learning" in machine learning. The model learns to recognize and extract useful features or patterns that are applicable across different datasets or buckets. In neural networks, especially in layers closer to the input, the model tends to learn more general features that can be shared across different types of data.

  4. Learning Correlations in Data Within Buckets That Might Not Generalize Across All Buckets: This touches on the concept of "model specialization." Within each bucket, the model might learn nuances and patterns specific to that subgroup. However, these specialized patterns might not be applicable or might be less effective when applied to data outside that bucket. This is a balance between generalization (applying learned knowledge broadly) and specialization (focusing on specific patterns within a subgroup).

Your assumptions reflect a sophisticated understanding of how data stratification, transfer learning, feature learning, and model specialization can play roles in training complex models like those used in image generation. These strategies help in handling large and diverse datasets efficiently while aiming for both broad applicability and specialized proficiency.

Me:

Ok, lets say we want to apply those concepts (or any other relevant concepts) to training an AI model on software synthesizer patch files, so that it can learn to generate synth patches based on inputs from a user.

There are many different synths out there, and each one uses a different format for their patch files; but many of the concepts they each use are very similar. Ideally we would like to be able to train the model with presets from a number of different synths, and to have it learn the general concepts of 'good patch design', as well as how to apply these concepts to the specifics required to implement in each individual synth's patch format.

It would also be great if based on those learnings, it was able to 'translate' a synth patch for one synthesizer, into the format of another synth; and have it generate the same (or very close to it) sound. This is complicated by the fact that different synths will have different internals for how they translate the synth patch into a sound; and so even if the patch files have similar parameters (eg. oscilators, wave shape, filters, etc), they probably won't be a 1 to 1 mapping between these parameters on different synths for the same sound.

It would also be cool if we could use an existing sound as an input, and have the model produce a patch file for a given synth that is able to re-create that sound (or something close to it)

I don't know if this would be relevant, but I was also thinking that it might be interesting/useful to be able to use existing models to enrich the 'metadata' of various sounds produced; ideally in a way that we can help teach the model better 'human words' for describing the 'feeling' or 'style' of various sounds.

Other things that might be cool to be able to do with a model like this, is a 'style transfer' sort of thing, where I can provide an existing synth patch and/or audio file, and some prompt text, and have the original modified in a way that makes it more like the prompt.

ChatGPT:

You're envisioning an AI model that operates in a complex, multidimensional space, encompassing sound synthesis, format translation, audio analysis, and even semantic interpretation of audio characteristics. Here are some key machine learning and AI concepts that could be employed to realize such a system:

  1. Transfer Learning and Feature Learning: Given that different synthesizers use different patch formats, but share underlying concepts (like oscillators, waveforms, filters), a model could be trained to understand these common features. Transfer learning would allow the model to apply its understanding of synthesis concepts learned from one synth to another, despite differences in format.

  2. Cross-Modal Learning for Patch Translation: Translating a patch from one synth to another, where the internal sound generation mechanisms differ, requires the model to learn cross-modal mappings. This involves understanding how different parameter configurations in one synth correspond to those in another to produce similar sounds.

  3. Audio Analysis and Reverse Engineering: For the model to recreate a sound from an audio input, it needs to analyze the audio and reverse-engineer the parameters that might have produced it. This involves deep learning techniques in audio processing and synthesis.

  4. Semantic Analysis and Metadata Enrichment: Enriching the metadata of sounds with descriptors about their 'feeling' or 'style' involves natural language processing (NLP) and perhaps some form of sentiment analysis or style recognition. Training a model to understand and generate human-like descriptions of sounds involves bridging audio analysis with semantic language understanding.

  5. Style Transfer and Generative Models: Just like style transfer in images, this concept can be applied to audio. Here, the 'style' might refer to the characteristic sound of a synth patch. The model would need to understand the essence of a given sound and then apply this essence to modify another sound or synth patch according to a text prompt.

Implementing such a system would likely require a combination of deep neural networks for audio processing and synthesis, NLP models for semantic understanding, and potentially generative adversarial networks (GANs) for style transfer and patch generation tasks.

Given your background in software engineering and interest in AI, you might approach this project by:

  • Collecting and preprocessing data from various synthesizers, ensuring a diverse range of sounds and patch formats.
  • Experimenting with existing models for audio analysis and synthesis, and adapting them to your specific use case.
  • Incorporating NLP techniques for semantic analysis and metadata enrichment.
  • Iteratively training and fine-tuning the model, potentially starting with a simpler task (like understanding patches from a single synth) and gradually increasing complexity.

This project would be quite ambitious, blending multiple complex domains of AI, but it's certainly an intriguing concept with potential for innovative developments in sound synthesis and audio AI.

Me:

If we assume that the model itself doesn't need to directly generate any audio (as that could be done by using a generated patch with the synth itself); and we wanted to focus more specifically on the aspects of training the model to understand how to generate patches (ideally for different synths); would it make sense to collect all the patches for each synth into distinct 'buckets'?

Can you then also expand on more of the technical specifics that we should know about for implementing an algorithm that can learn to identify how each of these synth patches work?

If the algorithm needs a 'feedback'/'reward' as part of training, what would be the best methods we could use for this, keeping in mind all of our original ideas/goals for this model? Would we be able to use the audio/similar output from the synth patch as part of the reward mechanism?

ChatGPT:

Focusing on the generation of synth patches without the direct generation of audio simplifies the problem to an extent but still involves complex challenges. Here are some steps and technical specifics for implementing such a system:

Data Collection and Organization

  1. Data Stratification: Collecting patches for each synth and organizing them into distinct 'buckets' is a good approach. This allows the model to learn the specifics of each synth's patch format and the commonalities in sound synthesis across different synths.

  2. Data Standardization: Given that different synths have different patch formats, some level of standardization or normalization of data might be necessary. This could involve extracting and aligning comparable features from each patch format.

Learning Algorithm Implementation

  1. Feature Extraction: The model needs to understand the parameters in a synth patch and how they contribute to the final sound. Techniques like principal component analysis (PCA) or autoencoders could be used to identify the most relevant features in patches.

  2. Neural Networks: Deep learning models, particularly neural networks, could be well-suited for this task. Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) can be used, depending on the data's structure and the relationships you wish to model.

  3. Transfer Learning: If some synths share similar parameters or synthesis methods, transfer learning can be employed to leverage knowledge gained from one synth model to inform about others.

Feedback/Reward Mechanism

  1. Supervised Learning with Audio Output: If you can pair each patch with a recording of its audio output, you can use this as ground truth for supervised learning. The model can then be trained to generate patches that, when played on the respective synth, produce the desired audio.

  2. User Feedback: Implement a system where users can rate the quality or suitability of generated patches. This feedback can be used as a reward signal to improve the model iteratively.

  3. Objective Metrics: Develop objective metrics based on sound characteristics (like timbre, texture, or harmony) that define a 'good' patch. These metrics can guide the training process.

  4. Reinforcement Learning: If a clear reward signal can be established (like user ratings or adherence to objective metrics), reinforcement learning could be an approach. The model would learn to generate patches that maximize the reward signal.

Considerations for Implementation

  • Computational Resources: Training deep learning models, especially those handling complex data like synth patches, requires significant computational resources.

  • Data Diversity: Ensure that the dataset includes a wide variety of patches, covering a broad spectrum of sounds and styles, to encourage generalizability in the model.

  • Iterative Development: Start with a simpler model, perhaps focusing on a single synth, and gradually expand to more complex scenarios and multiple synths.

  • User Interface: Consider the design of the user interface, especially if you plan to incorporate user feedback into the training process.

Implementing such a model combines aspects of audio processing, machine learning, and user interaction. It's a challenging yet potentially rewarding project, especially given your background in software engineering and IT.

See Also

My Other Related Deepdive Gist's and Projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment