nateraw/songstarter-v0-2-demo.ipynb

## songstarter-v0-2-demo.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "machine_shape": "hm",
      "gpuType": "A100",
      "authorship_tag": "ABX9TyPN2wiQy975sqo8qrLuiKZa",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "%%capture\n",
        "! pip install git+https://github.com/facebookresearch/audiocraft\n",
        "! pip install torchvision==0.16.0\n",
        "! pip install hf-transfer"
      ],
      "metadata": {
        "id": "U9mIvEuyANKI"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "\n",
        "# enables faster downloads from hugging face\n",
        "os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
      ],
      "metadata": {
        "id": "rOmrzpFOBy9G"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WsugXWQuAMjQ"
      },
      "outputs": [],
      "source": [
        "import torchaudio\n",
        "from audiocraft.models import MusicGen\n",
        "from audiocraft.data.audio import audio_write\n",
        "from huggingface_hub import hf_hub_download\n",
        "\n",
        "model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Generate music from text descriptions"
      ],
      "metadata": {
        "id": "XxkbC2puEBVq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "model.set_generation_params(duration=10)  # generate 10 seconds.\n",
        "descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3\n",
        "wav = model.generate(descriptions)  # generates 3 samples.\n",
        "\n",
        "for idx, one_wav in enumerate(wav):\n",
        "    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.\n",
        "    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy=\"loudness\", loudness_compressor=True)"
      ],
      "metadata": {
        "id": "uHnG93FZA5Lz"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from IPython.display import Audio\n",
        "\n",
        "# Listen to the first sample\n",
        "Audio('0.wav')"
      ],
      "metadata": {
        "id": "M3x2Kg4lCVhy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We trained the model for text-to-music, but since the base model had melody conditioning, we get that for free.\n",
        "\n",
        "That means you can generate samples that sound like some other sample you have.\n",
        "\n",
        "Let's first listen to our melody sample:"
      ],
      "metadata": {
        "id": "cghgF0-7DR8K"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "melody_path = hf_hub_download(\"nateraw/musicgen-songstarter-v0.2\", 'assets/bach.mp3')\n",
        "Audio(melody_path)"
      ],
      "metadata": {
        "id": "u5QiT6svDASW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now lets take that bach organ sample and remix it into a hip hop guitar sample"
      ],
      "metadata": {
        "id": "4TRqtV13D1bj"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "melody, sr = torchaudio.load(melody_path)\n",
        "# generates using the melody from the given audio and the provided descriptions.\n",
        "wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)\n",
        "\n",
        "for idx, one_wav in enumerate(wav):\n",
        "    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.\n",
        "    audio_write(f'{idx}_bach', one_wav.cpu(), model.sample_rate, strategy=\"loudness\", loudness_compressor=True)"
      ],
      "metadata": {
        "id": "_XzkcSVQA4mt"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Listen to the first melody conditioned sample\n",
        "Audio('0_bach.wav')"
      ],
      "metadata": {
        "id": "WLrYymKPCbjR"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": [],
	"machine_shape": "hm",
	"gpuType": "A100",
	"authorship_tag": "ABX9TyPN2wiQy975sqo8qrLuiKZa",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	},
	"accelerator": "GPU"
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "code",
	"source": [
	"%%capture\n",
	"! pip install git+https://github.com/facebookresearch/audiocraft\n",
	"! pip install torchvision==0.16.0\n",
	"! pip install hf-transfer"
	],
	"metadata": {
	"id": "U9mIvEuyANKI"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"import os\n",
	"\n",
	"# enables faster downloads from hugging face\n",
	"os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
	],
	"metadata": {
	"id": "rOmrzpFOBy9G"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "WsugXWQuAMjQ"
	},
	"outputs": [],
	"source": [
	"import torchaudio\n",
	"from audiocraft.models import MusicGen\n",
	"from audiocraft.data.audio import audio_write\n",
	"from huggingface_hub import hf_hub_download\n",
	"\n",
	"model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')"
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"Generate music from text descriptions"
	],
	"metadata": {
	"id": "XxkbC2puEBVq"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"model.set_generation_params(duration=10) # generate 10 seconds.\n",
	"descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3\n",
	"wav = model.generate(descriptions) # generates 3 samples.\n",
	"\n",
	"for idx, one_wav in enumerate(wav):\n",
	" # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.\n",
	" audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy=\"loudness\", loudness_compressor=True)"
	],
	"metadata": {
	"id": "uHnG93FZA5Lz"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"from IPython.display import Audio\n",
	"\n",
	"# Listen to the first sample\n",
	"Audio('0.wav')"
	],
	"metadata": {
	"id": "M3x2Kg4lCVhy"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"We trained the model for text-to-music, but since the base model had melody conditioning, we get that for free.\n",
	"\n",
	"That means you can generate samples that sound like some other sample you have.\n",
	"\n",
	"Let's first listen to our melody sample:"
	],
	"metadata": {
	"id": "cghgF0-7DR8K"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"melody_path = hf_hub_download(\"nateraw/musicgen-songstarter-v0.2\", 'assets/bach.mp3')\n",
	"Audio(melody_path)"
	],
	"metadata": {
	"id": "u5QiT6svDASW"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"Now lets take that bach organ sample and remix it into a hip hop guitar sample"
	],
	"metadata": {
	"id": "4TRqtV13D1bj"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"melody, sr = torchaudio.load(melody_path)\n",
	"# generates using the melody from the given audio and the provided descriptions.\n",
	"wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)\n",
	"\n",
	"for idx, one_wav in enumerate(wav):\n",
	" # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.\n",
	" audio_write(f'{idx}_bach', one_wav.cpu(), model.sample_rate, strategy=\"loudness\", loudness_compressor=True)"
	],
	"metadata": {
	"id": "_XzkcSVQA4mt"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"# Listen to the first melody conditioned sample\n",
	"Audio('0_bach.wav')"
	],
	"metadata": {
	"id": "WLrYymKPCbjR"
	},
	"execution_count": null,
	"outputs": []
	}
	]
	}