henryamster/image_sequence_driven_quick_clip_guided_diffusion_hq_256x256_and_512x512.ipynb

## image_sequence_driven_quick_clip_guided_diffusion_hq_256x256_and_512x512.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Image_Sequence_Driven_Quick_CLIP_Guided_Diffusion_HQ_256x256_and_512x512.ipynb",
      "private_outputs": true,
      "provenance": [],
      "collapsed_sections": [],
      "machine_shape": "hm",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/henryamster/7d7b871149aae1b96e460e722e490bbb/image_sequence_driven_quick_clip_guided_diffusion_hq_256x256_and_512x512.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Gf9n360RoB6U"
      },
      "source": [
        "# IMAGE SEQUENCE QUICK CLIP GUIDED DIFFUSION \n",
        "\n",
        "\n",
        "<!-- [![Image of result](https://thumbs.gfycat.com/WeePlushElk-mobile.jpg)](https://thumbs.gfycat.com/WeePlushElk-mobile.mp4 \"Link Title\") -->\n",
        "## Credits\n",
        "\n",
        "Original notebook by **Katherine Crowson** (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). It uses either OpenAI's 256x256 unconditional ImageNet or **Katherine Crowson's fine-tuned 512x512 diffusion model** (https://github.com/openai/guided-diffusion), together with **CLIP **(https://github.com/openai/CLIP) to connect text prompts with images.\n",
        "\n",
        "\n",
        "Katherine's original notebook can be found here:\n",
        "https://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "Modified by **Daniel Russell** (https://github.com/russelldc, https://twitter.com/danielrussruss) to include (hopefully) optimal params for quick generations in 15-100 timesteps rather than 1000, as well as more robust augmentations. **Dango233** and **nsheppard** helped improve the quality of diffusion in general, and especially so for shorter runs like this notebook aims to achieve.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "image sequence driver modifications (https://github.com/shellward). \n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "Please be mindful of the computing resources you are using, and be sure to credit those listed above (you don't need to include me as my contribution was moving like two lines of code around.) I strongly recommend that you use your own work as a driver.\n",
        "\n",
        "If you don't have anything to use as a driver, I suggest learning any of the following:\n",
        "\n",
        "\n",
        "*   **Blender** (https://blender.org)\n",
        "*   **Processing** (https://processing.org)\n",
        "*   **P5JS** (https://p5js.org)\n",
        "*   **GLSL** (https://thebookofshaders.com)\n",
        "*   **Three.js** (https://threejs.org)\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qZ3rNuAWAewx",
        "cellView": "form"
      },
      "source": [
        "#@title Check available graphics card\n",
        "import torch\n",
        "# Check the GPU status\n",
        "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n",
        "print('Using device:', device)\n",
        "!nvidia-smi"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yZsjzwS0YGo6",
        "cellView": "form"
      },
      "source": [
        "from google.colab import drive\n",
        "\n",
        "#@title Choose model here:\n",
        "diffusion_model = \"512x512_diffusion_uncond_finetune_008100\" #@param [\"256x256_diffusion_uncond\", \"512x512_diffusion_uncond_finetune_008100\"]\n",
        "google_drive = True \n",
        "\n",
        "#@markdown **NOTICE:** You must connect your google drive to run this notebook. The  model, as well as each frame of your animation will be saved there. If you revisit this colab in the future, you will not need to download the model again.\n",
        "yes_please = True "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mQE-fIMnYKYK",
        "cellView": "form"
      },
      "source": [
        "#@title Download/load diffusion model from google drive\n",
        "\n",
        "#@markdown The model will be stored at /content/drive/MyDrive/diffNotebook\n",
        "\n",
        "model_path = '/content/'\n",
        "if google_drive:\n",
        "    from google.colab import drive\n",
        "    drive.mount('/content/drive')\n",
        "    if yes_please:\n",
        "        model_path = '/content/drive/MyDrive/diffNotebook' \n",
        "\n",
        "if diffusion_model == '256x256_diffusion_uncond':\n",
        "    !wget --continue 'https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt' -P {model_path}\n",
        "elif diffusion_model == '512x512_diffusion_uncond_finetune_008100':\n",
        "    !wget --continue 'https://the-eye.eu/public/AI/models/512x512_diffusion_unconditional_ImageNet/512x512_diffusion_uncond_finetune_008100.pt' -P {model_path}\n",
        "\n",
        "if google_drive and not yes_please:\n",
        "    model_path = '/content/drive/MyDrive/diffNotebook' \n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4jxCQbtInUCN"
      },
      "source": [
        "# Install and import dependencies"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-_UVMZCIAq_r",
        "cellView": "form"
      },
      "source": [
        "#@title Clone repositories for CLIP & Guided Diffusion, install dependencies\n",
        "!git clone https://github.com/openai/CLIP\n",
        "!git clone https://github.com/crowsonkb/guided-diffusion\n",
        "!pip install -e ./CLIP\n",
        "!pip install -e ./guided-diffusion\n",
        "!pip install lpips datetime"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JmbrcrhpBPC6",
        "cellView": "form"
      },
      "source": [
        "#@title Import libraries\n",
        "import gc\n",
        "import io\n",
        "import math\n",
        "import sys\n",
        "from IPython import display\n",
        "import lpips\n",
        "from PIL import Image, ImageOps\n",
        "import requests\n",
        "import torch\n",
        "from torch import nn\n",
        "from torch.nn import functional as F\n",
        "import torchvision.transforms as T\n",
        "import torchvision.transforms.functional as TF\n",
        "from tqdm.notebook import tqdm\n",
        "sys.path.append('./CLIP')\n",
        "sys.path.append('./guided-diffusion')\n",
        "import clip\n",
        "from guided_diffusion.script_util import create_model_and_diffusion, model_and_diffusion_defaults\n",
        "from datetime import datetime\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import random"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "h4MHBPT1nirT"
      },
      "source": [
        "# Define necessary functions"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YHOj78Yvx8jP",
        "cellView": "form"
      },
      "source": [
        "#@title Functions\n",
        "def fetch(url_or_path):\n",
        "    if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n",
        "        r = requests.get(url_or_path)\n",
        "        r.raise_for_status()\n",
        "        fd = io.BytesIO()\n",
        "        fd.write(r.content)\n",
        "        fd.seek(0)\n",
        "        return fd\n",
        "    return open(url_or_path, 'rb')\n",
        "\n",
        "\n",
        "def parse_prompt(prompt):\n",
        "    if prompt.startswith('http://') or prompt.startswith('https://'):\n",
        "        vals = prompt.rsplit(':', 2)\n",
        "        vals = [vals[0] + ':' + vals[1], *vals[2:]]\n",
        "    else:\n",
        "        vals = prompt.rsplit(':', 1)\n",
        "    vals = vals + ['', '1'][len(vals):]\n",
        "    return vals[0], float(vals[1])\n",
        "\n",
        "def sinc(x):\n",
        "    return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))\n",
        "\n",
        "def lanczos(x, a):\n",
        "    cond = torch.logical_and(-a < x, x < a)\n",
        "    out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))\n",
        "    return out / out.sum()\n",
        "\n",
        "def ramp(ratio, width):\n",
        "    n = math.ceil(width / ratio + 1)\n",
        "    out = torch.empty([n])\n",
        "    cur = 0\n",
        "    for i in range(out.shape[0]):\n",
        "        out[i] = cur\n",
        "        cur += ratio\n",
        "    return torch.cat([-out[1:].flip([0]), out])[1:-1]\n",
        "\n",
        "def resample(input, size, align_corners=True):\n",
        "    n, c, h, w = input.shape\n",
        "    dh, dw = size\n",
        "\n",
        "    input = input.reshape([n * c, 1, h, w])\n",
        "\n",
        "    if dh < h:\n",
        "        kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)\n",
        "        pad_h = (kernel_h.shape[0] - 1) // 2\n",
        "        input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')\n",
        "        input = F.conv2d(input, kernel_h[None, None, :, None])\n",
        "\n",
        "    if dw < w:\n",
        "        kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)\n",
        "        pad_w = (kernel_w.shape[0] - 1) // 2\n",
        "        input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')\n",
        "        input = F.conv2d(input, kernel_w[None, None, None, :])\n",
        "\n",
        "    input = input.reshape([n, c, h, w])\n",
        "    return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)\n",
        "\n",
        "class MakeCutouts(nn.Module):\n",
        "    def __init__(self, cut_size, cutn, skip_augs=False):\n",
        "        super().__init__()\n",
        "        self.cut_size = cut_size\n",
        "        self.cutn = cutn\n",
        "        self.skip_augs = skip_augs\n",
        "        self.augs = T.Compose([\n",
        "            T.RandomHorizontalFlip(p=0.5),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomAffine(degrees=15, translate=(0.1, 0.1)),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomPerspective(distortion_scale=0.4, p=0.7),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomGrayscale(p=0.15),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            # T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),\n",
        "        ])\n",
        "\n",
        "    def forward(self, input):\n",
        "        input = T.Pad(input.shape[2]//4, fill=0)(input)\n",
        "        sideY, sideX = input.shape[2:4]\n",
        "        max_size = min(sideX, sideY)\n",
        "\n",
        "        cutouts = []\n",
        "        for ch in range(cutn):\n",
        "            if ch > cutn - cutn//4:\n",
        "                cutout = input.clone()\n",
        "            else:\n",
        "                size = int(max_size * torch.zeros(1,).normal_(mean=.8, std=.3).clip(float(self.cut_size/max_size), 1.))\n",
        "                offsetx = torch.randint(0, abs(sideX - size + 1), ())\n",
        "                offsety = torch.randint(0, abs(sideY - size + 1), ())\n",
        "                cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n",
        "\n",
        "            if not self.skip_augs:\n",
        "                cutout = self.augs(cutout)\n",
        "            cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))\n",
        "            del cutout\n",
        "\n",
        "        cutouts = torch.cat(cutouts, dim=0)\n",
        "        return cutouts\n",
        "\n",
        "\n",
        "def spherical_dist_loss(x, y):\n",
        "    x = F.normalize(x, dim=-1)\n",
        "    y = F.normalize(y, dim=-1)\n",
        "    return (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2)\n",
        "\n",
        "\n",
        "def tv_loss(input):\n",
        "    \"\"\"L2 total variation loss, as in Mahendran et al.\"\"\"\n",
        "    input = F.pad(input, (0, 1, 0, 1), 'replicate')\n",
        "    x_diff = input[..., :-1, 1:] - input[..., :-1, :-1]\n",
        "    y_diff = input[..., 1:, :-1] - input[..., :-1, :-1]\n",
        "    return (x_diff**2 + y_diff**2).mean([1, 2, 3])\n",
        "\n",
        "\n",
        "def range_loss(input):\n",
        "    return (input - input.clamp(-1, 1)).pow(2).mean([1, 2, 3])\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wXeOEMYHdtHt"
      },
      "source": [
        "#@title Settings for this notebook:\n",
        "Input:\n",
        "\n",
        "\n",
        "\n",
        "input_dir = f'Blender-Write-To-Server/wts29/rend'\n",
        "output_dir = f'diffNotebook/06'\n",
        "# onionSkinning=True\n",
        "frameNum=1"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "X5gODNAMEUCR"
      },
      "source": [
        "frameNum=1\n",
        "\n",
        "def do_run():\n",
        "    global frameNum\n",
        "    loss_values = []\n",
        " \n",
        "    if seed is not None:\n",
        "        np.random.seed(seed)\n",
        "        random.seed(seed)\n",
        "        torch.manual_seed(seed)\n",
        "        torch.cuda.manual_seed_all(seed)\n",
        "        torch.backends.cudnn.deterministic = True\n",
        " \n",
        "    make_cutouts = MakeCutouts(clip_size, cutn, skip_augs=skip_augs)\n",
        "    target_embeds, weights = [], []\n",
        " \n",
        "    for prompt in text_prompts:\n",
        "        txt, weight = parse_prompt(prompt)\n",
        "        txt = clip_model.encode_text(clip.tokenize(prompt).to(device)).float()\n",
        " \n",
        "        if fuzzy_prompt:\n",
        "            for i in range(25):\n",
        "                target_embeds.append((txt + torch.randn(txt.shape).cuda() * rand_mag).clamp(0,1))\n",
        "                weights.append(weight)\n",
        "        else:\n",
        "            target_embeds.append(txt)\n",
        "            weights.append(weight)\n",
        " \n",
        "    for prompt in image_prompts:\n",
        "        path, weight = parse_prompt(prompt)\n",
        "        img = Image.open( f'/content/drive/MyDrive/{input_dir}/{frameNum+1:04}.png').convert('RGB')\n",
        "        img = TF.resize(img, min(side_x, side_y, *img.size), T.InterpolationMode.LANCZOS)\n",
        "        batch = make_cutouts(TF.to_tensor(img).to(device).unsqueeze(0).mul(2).sub(1))\n",
        "        embed = clip_model.encode_image(normalize(batch)).float()\n",
        "        if fuzzy_prompt:\n",
        "            for i in range(25):\n",
        "                target_embeds.append((embed + torch.randn(embed.shape).cuda() * rand_mag).clamp(0,1))\n",
        "                weights.extend([weight / cutn] * cutn)\n",
        "        else:\n",
        "            target_embeds.append(embed)\n",
        "            weights.extend([weight / cutn] * cutn)\n",
        " \n",
        "    target_embeds = torch.cat(target_embeds)\n",
        "    weights = torch.tensor(weights, device=device)\n",
        "    if weights.sum().abs() < 1e-3:\n",
        "        raise RuntimeError('The weights must not sum to 0.')\n",
        "    weights /= weights.sum().abs()\n",
        " \n",
        "    init = None\n",
        "    if init_image is not None:\n",
        "        init = Image.open( f'/content/drive/MyDrive/{input_dir}/{frameNum+1:04}.png').convert('RGB')\n",
        "        init = init.resize((side_x, side_y), Image.LANCZOS)\n",
        "        init = TF.to_tensor(init).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "    if perlin_init:\n",
        "        if perlin_mode == 'color':\n",
        "            init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, False)\n",
        "            init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, False)\n",
        "        elif perlin_mode == 'gray':\n",
        "           init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, True)\n",
        "           init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, True)\n",
        "        else:\n",
        "           init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, False)\n",
        "           init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, True)\n",
        "        \n",
        "        # init = TF.to_tensor(init).add(TF.to_tensor(init2)).div(2).to(device)\n",
        "        init = TF.to_tensor(init).add(TF.to_tensor(init2)).div(2).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "        del init2\n",
        " \n",
        "    cur_t = None\n",
        " \n",
        "    def cond_fn(x, t, y=None):\n",
        "        with torch.enable_grad():\n",
        "            x = x.detach().requires_grad_()\n",
        "            n = x.shape[0]\n",
        "            my_t = torch.ones([n], device=device, dtype=torch.long) * cur_t\n",
        "            out = diffusion.p_mean_variance(model, x, my_t, clip_denoised=False, model_kwargs={'y': y})\n",
        "            fac = diffusion.sqrt_one_minus_alphas_cumprod[cur_t]\n",
        "            x_in = out['pred_xstart'] * fac + x * (1 - fac)\n",
        "            x_in_grad = torch.zeros_like(x_in)\n",
        "            for i in range(cutn_batches):\n",
        "                clip_in = normalize(make_cutouts(x_in.add(1).div(2)))\n",
        "                image_embeds = clip_model.encode_image(clip_in).float()\n",
        "                dists = spherical_dist_loss(image_embeds.unsqueeze(1), target_embeds.unsqueeze(0))\n",
        "                dists = dists.view([cutn, n, -1])\n",
        "                losses = dists.mul(weights).sum(2).mean(0)\n",
        "                loss_values.append(losses.sum().item()) # log loss, probably shouldn't do per cutn_batch\n",
        "                x_in_grad += torch.autograd.grad(losses.sum() * clip_guidance_scale, x_in)[0] / cutn_batches\n",
        "            tv_losses = tv_loss(x_in)\n",
        "            range_losses = range_loss(out['pred_xstart'])\n",
        "            sat_losses = torch.abs(x_in - x_in.clamp(min=-1,max=1)).mean()\n",
        "            loss = tv_losses.sum() * tv_scale + range_losses.sum() * range_scale + sat_losses.sum() * sat_scale\n",
        "            if init is not None and init_scale:\n",
        "                init_losses = lpips_model(x_in, init)\n",
        "                loss = loss + init_losses.sum() * init_scale\n",
        "            x_in_grad += torch.autograd.grad(loss, x_in)[0]\n",
        "            grad = -torch.autograd.grad(x_in, x, x_in_grad)[0]\n",
        "        if clamp_grad:\n",
        "            magnitude = grad.square().mean().sqrt()\n",
        "            return grad * magnitude.clamp(max=0.05) / magnitude\n",
        "        return grad\n",
        " \n",
        "    if model_config['timestep_respacing'].startswith('ddim'):\n",
        "        sample_fn = diffusion.ddim_sample_loop_progressive\n",
        "    else:\n",
        "        sample_fn = diffusion.p_sample_loop_progressive\n",
        " \n",
        "    for i in range(n_batches):\n",
        "        frameNum=i+1\n",
        "        cur_t = diffusion.num_timesteps - skip_timesteps - 1\n",
        " \n",
        "        i_i =TF.to_tensor(Image.open( f'/content/drive/MyDrive/{input_dir}/{frameNum+1:04}.png').convert('RGB').resize((side_x, side_y), Image.LANCZOS)).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "   \n",
        "   \n",
        "        # if frameNum>1 and onionSkinning == True:\n",
        "        #     img2 =TF.to_tensor(Image.open(f'/content/drive/MyDrive/{input_dir}/{frameNum:04}.png').convert('RGB').resize((side_x, side_y), Image.LANCZOS)).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "        #     i_i = torch.mean(torch.stack([i_i, img2]), dim=0)\n",
        "\n",
        "        if model_config['timestep_respacing'].startswith('ddim'):\n",
        "            samples = sample_fn(\n",
        "                model,\n",
        "                (batch_size, 3, side_y, side_x),\n",
        "                clip_denoised=clip_denoised,\n",
        "                model_kwargs={},\n",
        "                cond_fn=cond_fn,\n",
        "                progress=True,\n",
        "                skip_timesteps=skip_timesteps,\n",
        "                init_image=i_i,\n",
        "                randomize_class=randomize_class,\n",
        "                eta=eta,\n",
        "            )\n",
        "        else:\n",
        "            samples = sample_fn(\n",
        "                model,\n",
        "                (batch_size, 3, side_y, side_x),\n",
        "                clip_denoised=clip_denoised,\n",
        "                model_kwargs={},\n",
        "                cond_fn=cond_fn,\n",
        "                progress=True,\n",
        "                skip_timesteps=skip_timesteps,\n",
        "                init_image=i_i,\n",
        "                randomize_class=randomize_class,\n",
        "            )\n",
        "\n",
        "        for j, sample in enumerate(samples):\n",
        "            display.clear_output(wait=True)\n",
        "            cur_t -= 1\n",
        "            if j % display_rate == 0 or cur_t == -1: # get rid of j % display_rate == 0 or\n",
        "                    for k, image in enumerate(sample['pred_xstart']):\n",
        "                        tqdm.write(f'Batch {i}, step {j}, output {k}:')\n",
        "                        current_time = datetime.now().strftime('%y%m%d-%H%M%S_%f')\n",
        "                        #filename = f'progress_batch{i:05}_iteration{j:05}_output{k:05}_{current_time}.png' #change this to save to a temp image, and replace each time\n",
        "                        filename = f'{frameNum:04}.png'\n",
        "                        image = TF.to_pil_image(image.add(1).div(2).clamp(0, 1))\n",
        "                        #if j % 10 == 0:\n",
        "                        image.save(f'/content/drive/MyDrive/{output_dir}/output/' + filename)\n",
        "                        display.display(display.Image(f'/content/drive/MyDrive/{output_dir}/output/' + filename))\n",
        "                        # if google_drive and cur_t == -1:\n",
        "                        #     image.save('/content/drive/MyDrive/' + filename)\n",
        " \n",
        "        plt.plot(np.array(loss_values), 'r')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CQVtY1Ixnqx4"
      },
      "source": [
        "# Load Diffusion and CLIP models"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fpbody2NCR7w"
      },
      "source": [
        "timestep_respacing = 'ddim50' # Modify this value to decrease the number of timesteps.\n",
        "# timestep_respacing = '25'\n",
        "diffusion_steps = 1000\n",
        "diffusion_model = '512x512_diffusion_uncond_finetune_008100'\n",
        "\n",
        "model_config = model_and_diffusion_defaults()\n",
        "if diffusion_model == '512x512_diffusion_uncond_finetune_008100':\n",
        "    model_config.update({\n",
        "        'attention_resolutions': '32, 16, 8',\n",
        "        'class_cond': False,\n",
        "        'diffusion_steps': diffusion_steps,\n",
        "        'rescale_timesteps': True,\n",
        "        'timestep_respacing': timestep_respacing,\n",
        "        'image_size': 512,\n",
        "        'learn_sigma': True,\n",
        "        'noise_schedule': 'linear',\n",
        "        'num_channels': 256,\n",
        "        'num_head_channels': 64,\n",
        "        'num_res_blocks': 2,\n",
        "        'resblock_updown': True,\n",
        "        'use_fp16': True,\n",
        "        'use_scale_shift_norm': True,\n",
        "    })\n",
        "elif diffusion_model == '256x256_diffusion_uncond':\n",
        "    model_config.update({\n",
        "        'attention_resolutions': '32, 16, 8',\n",
        "        'class_cond': False,\n",
        "        'diffusion_steps': diffusion_steps,\n",
        "        'rescale_timesteps': True,\n",
        "        'timestep_respacing': timestep_respacing,\n",
        "        'image_size': 256,\n",
        "        'learn_sigma': True,\n",
        "        'noise_schedule': 'linear',\n",
        "        'num_channels': 256,\n",
        "        'num_head_channels': 64,\n",
        "        'num_res_blocks': 2,\n",
        "        'resblock_updown': True,\n",
        "        'use_fp16': True,\n",
        "        'use_scale_shift_norm': True,\n",
        "    })\n",
        "side_x = side_y = model_config['image_size']\n",
        "\n",
        "model, diffusion = create_model_and_diffusion(**model_config)\n",
        "model.load_state_dict(torch.load(f'/content/drive/MyDrive/diffNotebook/512x512_diffusion_uncond_finetune_008100.pt', map_location='cpu'))\n",
        "model.requires_grad_(False).eval().to(device)\n",
        "for name, param in model.named_parameters():\n",
        "    if 'qkv' in name or 'norm' in name or 'proj' in name:\n",
        "        param.requires_grad_()\n",
        "if model_config['use_fp16']:\n",
        "    model.convert_to_fp16()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VnQjGugaDZPJ"
      },
      "source": [
        "clip_model = clip.load('ViT-B/16', jit=False)[0].eval().requires_grad_(False).to(device)\n",
        "clip_size = clip_model.visual.input_resolution\n",
        "normalize = T.Normalize(mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711])\n",
        "lpips_model = lpips.LPIPS(net='vgg').to(device)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9zY-8I90LkC6"
      },
      "source": [
        "# Settings"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "U0PwzFZbLfcy"
      },
      "source": [
        "# frameNum=320\n",
        "text_prompts = [\n",
        "    #  'another missed concert poster by Theophile Steinlen',\n",
        "    #  'this is the work of a moth by Maria Sibylla Merian',\n",
        "    #  'flower wall by Rachel Ruysch',\n",
        "]\n",
        "image_prompts = [\n",
        "    # 'mona.jpg',\n",
        "   #\n",
        "]\n",
        "\n",
        "# 350/50/50/32 and 500/0/0/64 have worked well for 25 timesteps on 256px\n",
        "# Also, sometimes 1 cutn actually works out fine\n",
        "\n",
        "clip_guidance_scale =2000 # 1000 - Controls how much the image should look like the prompt.\n",
        "tv_scale = 150 # 150 - Controls the smoothness of the final output.\n",
        "range_scale = 150 # 150 - Controls how far out of range RGB values are allowed to be.\n",
        "sat_scale = 0 # 0 - Controls how much saturation is allowed. From nshepperd's JAX notebook.\n",
        "cutn = 16 # 16 - Controls how many crops to take from the image.\n",
        "cutn_batches = 2 # 2 - Accumulate CLIP gradient from multiple batches of cuts [Can help with OOM errors / Low VRAM]\n",
        "\n",
        "init_image = f'/content/drive/MyDrive/{input_dir}/{frameNum+1:04}.png' # None - URL or local path\n",
        "init_scale = 1000 # 0 - This enhances the effect of the init image, a good value is 1000\n",
        "skip_timesteps = 23 # 0 - Controls the starting point along the diffusion timesteps\n",
        "perlin_init = False # False - Option to start with random perlin noise\n",
        "perlin_mode = 'mixed' # 'mixed' ('gray', 'color')\n",
        "\n",
        "skip_augs = False # False - Controls whether to skip torchvision augmentations\n",
        "randomize_class = True # True - Controls whether the imagenet class is randomly changed each iteration\n",
        "clip_denoised = False # False - Determines whether CLIP discriminates a noisy or denoised image\n",
        "clamp_grad = True # True - Experimental: Using adaptive clip grad in the cond_fn\n",
        "\n",
        "seed = 3238143300 #random.randint(0, 2**32)\n",
        "# seed = random.randint(0, 2**32) # Choose a random seed and print it at end of run for reproduction\n",
        "\n",
        "fuzzy_prompt = False # False - Controls whether to add multiple noisy prompts to the prompt losses\n",
        "rand_mag = 0.01 # 0.1 - Controls the magnitude of the random noise\n",
        "eta = 0.5 # 0.0 - c"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nf9hTc8YLoLx"
      },
      "source": [
        "# Diffuse!"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LHLiO56OfwgD"
      },
      "source": [
        "display_rate = 1\n",
        "n_batches = 360 # 1 - Controls how many consecutive batches of images are generated\n",
        "batch_size = 1 # 1 - Controls how many images are generated in parallel in a batch\n",
        "\n",
        "gc.collect()\n",
        "torch.cuda.empty_cache()\n",
        "try:\n",
        "    do_run()\n",
        "except KeyboardInterrupt:\n",
        "    pass\n",
        "finally:\n",
        "    print('seed', seed)\n",
        "    gc.collect()\n",
        "    torch.cuda.empty_cache()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4sjiHy2ygOJX"
      },
      "source": [
        "#@title Download project & model\n",
        "# #@title Install Real-ESRGAN\n",
        "# # Clone Real-ESRGAN and enter the Real-ESRGAN\n",
        "!git clone https://github.com/xinntao/Real-ESRGAN.git\n",
        "%cd Real-ESRGAN\n",
        "# Set up the environment\n",
        "!pip install basicsr\n",
        "!pip install -r requirements.txt\n",
        "!python setup.py develop\n",
        "# Download the pre-trained model\n",
        "\n",
        "!wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth -P experiments/pretrained_models"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lkV1WLPBYoyp"
      },
      "source": [
        "import gc\n",
        "gc.collect()\n",
        "torch.cuda.empty_cache()\n",
        "!python inference_realesrgan.py --model_path experiments/pretrained_models/RealESRGAN_x4plus.pth --input /content/drive/MyDrive/diffNotebook/02/output --output /content/drive/MyDrive/diffNotebook/02/large --netscale 4 --outscale 2 --half"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_AZL2QfT9w25"
      },
      "source": [
        "!ffmpeg -y -vcodec png -r 12 -start_number 1 -i /content/drive/MyDrive/diffNotebook/01/large/%04d_out.png -c:v libx264 -vf fps=12 -pix_fmt yuv420p -crf 17 -preset veryslow /content/drive/MyDrive/diffNotebook/01/stitched.mp4"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DUCUlQvt9g_S"
      },
      "source": [
        "# @title **Download Super-Slomo model**\n",
        "!git clone -q --depth 1 https://github.com/avinashpaliwal/Super-SloMo.git"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vfBtU6ai9wOr"
      },
      "source": [
        "from os.path import exists\n",
        "def download_from_google_drive(file_id, file_name):\n",
        "  # download a file from the Google Drive link\n",
        "  !rm -f ./cookie\n",
        "  !curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id={file_id}\" > /dev/null\n",
        "  confirm_text = !awk '/download/ {print $NF}' ./cookie\n",
        "  confirm_text = confirm_text[0]\n",
        "  !curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}\" -o {file_name}\n",
        "  \n",
        "pretrained_model = 'SuperSloMo.ckpt'\n",
        "if not exists(pretrained_model):\n",
        "  download_from_google_drive('1IvobLDbRiBgZr3ryCRrWL8xDbMZ-KnpF', pretrained_model)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3OrRp7jC9pi6"
      },
      "source": [
        "!python /content/Real-ESRGAN/Super-SloMo/video_to_slomo.py --checkpoint /content/Real-ESRGAN/SuperSloMo.ckpt --video /content/drive/MyDrive/{output_dir}/stitched.mp4 --sf \"3\" --fps \"12\" --output /content/drive/MyDrive/{output_dir}/slomo.mkv"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NiRHmIotIjuQ"
      },
      "source": [
        "!ffmpeg -i /content/drive/MyDrive/diffNotebook/01/slomo.mkv /content/drive/MyDrive/diffNotebook/01/final.mp4"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}