losandes/prompt-engineering-with-diffusion-models.ipynb

## prompt-engineering-with-diffusion-models.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "collapsed_sections": [
        "1T1MdvXN0WFG",
        "qzuxtUNb3gXR",
        "jYNXZJE55MWv",
        "EyVSupKzVbHK",
        "1qowQfIC6mqn",
        "7wPAZq0QNINE"
      ],
      "gpuType": "T4",
      "authorship_tag": "ABX9TyP3bF7K8VQ55w7nPgdDfVYt",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/losandes/b688b2b7473733d3f9f4cb0c1002e1db/prompt-engineering-with-diffusion-models.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Prompt Engineering with Diffusion Models (Text to Image Generation)\n",
        "\n",
        "This notebook provides tooling to explore prompt engineering: creating precise and informative instructions to acquire desired outputs from Diffusion models, using Hugging Face Diffusion Pipelines.\n",
        "\n",
        "This notebook is a compliment to [Diffusion Bench](https://github.com/losandes/diffusion-bench), is simpler than Diffusion Bench (this runs in the cloud and doesn't run batches), and is easy to modify. The README for Diffusion Bench provides examples that compliment this notebook, so if you're interested in this notebook, you should check that out too.\n",
        "\n",
        "If you aren't familiar with Hugging Face, text-to-image generation, or Diffusion models, the [Hugging Face NLP Course](https://huggingface.co/learn/nlp-course/chapter0/1) is a helpful introduction to what is explored here.\n",
        "\n",
        "_Note that GPUs are a limited resource, both in what is allocated to you and how long you can use them. Google doesn't provide much insight as to the limitations or when you are nearing your limits. I actually ran out of resources writing this notebook!_"
      ],
      "metadata": {
        "id": "CDUjsVKpy7Hm"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Setup\n",
        "\n",
        "**IMPORTANT: You need to run the Code in each section, in the order that it appears, each time you connect to a runtime.**\n",
        "\n",
        "To get started with this notebook, click \"File\", and \"Save a copy\" to a location of your choosing.\n",
        "\n",
        "This is an Image to Text notebook, so choose the T4 GPU (top right corner, click the down arrow next to RAM and Disk, choose \"change runtime type\", and choose T4 GPU).\n",
        "\n",
        "Expect different results from different processors. So if you choose something other than a T4 GPU, or if that is no longer available, the storylines at the bottom (try this... then that) may not make sense.\n",
        "\n",
        "The cell in this section installs the python packages that are used in this notebook."
      ],
      "metadata": {
        "id": "1T1MdvXN0WFG"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install transformers[sentencepiece]\n",
        "!pip install diffusers --upgrade\n",
        "!pip install invisible_watermark accelerate safetensors ipyplot torch scipy"
      ],
      "metadata": {
        "id": "_bENZzJe0ZKt"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Predictable Tensor Generation\n",
        "\n",
        "To explore the effects of changing prompts and variables, we need to be able to repeat results.\n",
        "\n",
        "This notebook uses [torch.randn](https://pytorch.org/docs/stable/generated/torch.randn.html) to generate a tensor filled with random numbers. It uses [torch.Generator](https://pytorch.org/docs/stable/generated/torch.Generator.html) to manage random number generation so that we can produce predictable tensors when providing a manual seed.[^1]\n",
        "\n",
        "These tensors will then be passed as an argument to the Diffusion Pipeline.\n",
        "\n",
        "[^1]: You can read more about generating tensors in [Using torch.randn and torch.randn_like to create Random Tensors in PyTorch](https://machinelearningknowledge.ai/using-torch-randn-and-torch-randn_like-to-create-random-tensors-in-pytorch/)"
      ],
      "metadata": {
        "id": "qzuxtUNb3gXR"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import torch\n",
        "\n",
        "NONE = None\n",
        "\n",
        "def make_latents (\n",
        "  device,\n",
        "  in_channels,\n",
        "  width=768,\n",
        "  height=768,\n",
        "  seed=NONE,\n",
        "):\n",
        "  \"\"\"\n",
        "  @see https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb\n",
        "  \"\"\"\n",
        "  # print(f\"device: {device}\")\n",
        "  # print(f\"in_channels: {in_channels}\")\n",
        "  # print(f\"width: {width}\")\n",
        "  # print(f\"height: {height}\")\n",
        "  # print(f\"seed: {seed}\")\n",
        "  # print(f\"together: {(1, in_channels, height // 8, width // 8)}\")\n",
        "\n",
        "  generator = torch.Generator(device=device)\n",
        "  seed = seed if seed is not None else generator.seed()\n",
        "  generator = generator.manual_seed(seed)\n",
        "\n",
        "  return [\n",
        "    seed,\n",
        "    torch.randn(\n",
        "      (1, in_channels, height // 8, width // 8),\n",
        "      generator = generator,\n",
        "      device = device,\n",
        "      dtype=torch.float16,\n",
        "    ),\n",
        "  ]\n",
        "\n",
        "  # latents should have shape like (4, 4, 64, 64)\n",
        "  # latents.shape\n"
      ],
      "metadata": {
        "id": "bROtNMs34lU2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Diffusion Pipelines\n",
        "\n",
        "The image text-to-image generation is performed by Hugging Face Pipelines. The code in this section configures the pipelines that we'll use for image generation.\n",
        "\n",
        "Two models are provided:\n",
        "\n",
        "-   [Dreamlike Photoreal 2](https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0)\n",
        "-   [Stable Diffusion XL Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)\n",
        "\n",
        "Try adding another! (e.g. [Analog Diffusion](https://huggingface.co/wavymulder/Analog-Diffusion) or [Openjourney](https://huggingface.co/prompthero/openjourney))\n",
        "\n",
        "_Note that, when adding a new pipeline, it's important to keep the same dtype (torch_dtype=torch.float16)_"
      ],
      "metadata": {
        "id": "jYNXZJE55MWv"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Dreamlike Photoreal 2\n",
        "\n",
        "Create a pipeline that uses [Dreamlike Photoreal 2](https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0)"
      ],
      "metadata": {
        "id": "byXY1iSvnF0k"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from diffusers import StableDiffusionPipeline\n",
        "import torch\n",
        "\n",
        "def dp2_pipeline (device):\n",
        "  \"\"\"\n",
        "  @see https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0\n",
        "\n",
        "  This model was trained on 768x768px images, so use:\n",
        "  - 768x768px\n",
        "  - 640x896px\n",
        "  - 896x640px\n",
        "  - 768x1024px\n",
        "  - 1024x768px\n",
        "  \"\"\"\n",
        "  model_id = \"dreamlike-art/dreamlike-photoreal-2.0\"\n",
        "  pipe = StableDiffusionPipeline.from_pretrained(\n",
        "    model_id,\n",
        "    use_safetensors=True,\n",
        "    torch_dtype=torch.float16,\n",
        "  ).to(device)\n",
        "\n",
        "  return [model_id, pipe]\n"
      ],
      "metadata": {
        "id": "TAU9V7Je5XET"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Stable Diffusion XL\n",
        "\n",
        "Create a pipeline that uses [Stable Diffusion XL Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)"
      ],
      "metadata": {
        "id": "8xrQaSxvnPtX"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from diffusers import DiffusionPipeline\n",
        "\n",
        "def sdxl_pipeline (device):\n",
        "  \"\"\"\n",
        "  @see https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0\n",
        "  \"\"\"\n",
        "  model_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\n",
        "  pipe = DiffusionPipeline.from_pretrained(\n",
        "    model_id,\n",
        "    use_safetensors=True,\n",
        "    torch_dtype=torch.float16,\n",
        "  ).to(device)\n",
        "\n",
        "  return [model_id, pipe]\n"
      ],
      "metadata": {
        "id": "N0GrETSBncRZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Text to Image Generator\n",
        "\n",
        "The code in this section composes the latent generation and pipeline execution to produce an image.\n",
        "\n"
      ],
      "metadata": {
        "id": "EyVSupKzVbHK"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def txt2img (pipeline, device, manual_seed):\n",
        "  \"\"\"\n",
        "  Produces a function that generates an image from text\n",
        "\n",
        "  Returns: (prompt, width, height, manual_seed, **kwargs) => Image\n",
        "  \"\"\"\n",
        "\n",
        "  def _txt2img (prompt, **kwargs):\n",
        "    \"\"\"\n",
        "    Generates an image from text\n",
        "\n",
        "    Returns: Image\n",
        "    \"\"\"\n",
        "\n",
        "    if not 'width' in kwargs or kwargs['width'] is None:\n",
        "      print(f\"expected int, actual width: {kwargs['width']}\")\n",
        "      raise Exception(\"width is required\")\n",
        "\n",
        "    if not 'height' in kwargs or kwargs['height'] is None:\n",
        "      print(f\"expected int, actual height: {kwargs['height']}\")\n",
        "      raise Exception(\"height is required\")\n",
        "\n",
        "    [model_id, pipe] = pipeline(device)\n",
        "    [seed, latents] = make_latents(\n",
        "      device,\n",
        "      pipe.unet.config.in_channels,\n",
        "      width=kwargs['width'],\n",
        "      height=kwargs['height'],\n",
        "      seed=manual_seed,\n",
        "    )\n",
        "\n",
        "    result = pipe(\n",
        "      prompt,\n",
        "      latents=latents,\n",
        "      **kwargs\n",
        "    )\n",
        "\n",
        "    return [result, model_id, seed]\n",
        "  return _txt2img"
      ],
      "metadata": {
        "id": "DzjGwLpuVfVZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Practice Prompt Engineering\n",
        "\n",
        "Following is the code you'll work with to explore prompt engineering with diffusion models. Look below the code for suggestions on how to get started."
      ],
      "metadata": {
        "id": "1qowQfIC6mqn"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "WIDTH           = 896\n",
        "HEIGHT          = 640\n",
        "STEPS           = 10 # try integers between 10 and 150\n",
        "DEVICE          = \"cuda\"\n",
        " # use `None` as the value for SEED to generate a new seed\n",
        "SEED            = 6886045143854674\n",
        "PROMPT          = \"a painting of a woman talking with a robot\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "GUIDANCE_SCALE  = 7.5\n",
        "PIPELINE        = dp2_pipeline # dp2_pipeline | sdxl_pipeline\n",
        "\n",
        "[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
        "  PROMPT,\n",
        "  negative_prompt=NEGATIVE_PROMPT,\n",
        "  num_inference_steps=STEPS,\n",
        "  width=WIDTH,\n",
        "  height=HEIGHT,\n",
        "  guidance_scale=GUIDANCE_SCALE\n",
        ")\n",
        "\n",
        "print(\"\")\n",
        "print(f\"PROMPT: {PROMPT}\")\n",
        "print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
        "print(f\"STEPS: {STEPS}\")\n",
        "print(f\"WIDTH: {WIDTH}\")\n",
        "print(f\"HEIGHT: {HEIGHT}\")\n",
        "print(f\"DEVICE: {DEVICE}\")\n",
        "print(f\"MODEL_ID: {model_id}\")\n",
        "print(f\"SEED: {seed}\")\n",
        "print(\"\")\n",
        "result.images[0]"
      ],
      "metadata": {
        "id": "5h-H58or6zRY"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Refine with Prompts\n",
        "\n",
        "How does changing a prompt change the image?\n",
        "\n",
        "**Try**:\n",
        "\n",
        "```py\n",
        "STEPS           = 10\n",
        "SEED            = 6886045143854674\n",
        "PROMPT          = \"a painting of a woman talking with a robot\"\n",
        "```\n",
        "\n",
        "Fun... mangled hands though. Hands are difficult. We could try to remove them with negative prompts, but let's keep working on the prompt for now.\n",
        "\n",
        "Add an artist, and change the number of steps to 20:\n",
        "\n",
        "```py\n",
        "STEPS           = 20\n",
        "SEED            = 6886045143854674\n",
        "PROMPT          = \"a painting of a woman talking with a robot, by Alphonse Mucha\"\n",
        "```\n",
        "\n",
        "In this example, adding the artist produces a significant change. Let's instruct it to change the style:\n",
        "\n",
        "```py\n",
        "STEPS           = 20\n",
        "SEED            = 6886045143854674\n",
        "PROMPT          = \"a painting of a woman talking with a robot, intricate, elegant, trending on artstation, masterpiece, by Alphonse Mucha\"\n",
        "```\n",
        "\n",
        "What happens if we change the artist now?\n",
        "\n",
        "```py\n",
        "STEPS           = 20\n",
        "SEED            = 6886045143854674\n",
        "PROMPT          = \"a painting of a woman talking with a robot, intricate, elegant, trending on artstation, masterpiece, by Sam Spratt\"\n",
        "```\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "dS5gy-q0HlJQ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Refine with Negative Prompts\n",
        "\n",
        "How does a negative prompt change the image?\n",
        "\n",
        "**Try**:\n",
        "\n",
        "```py\n",
        "SEED            = 129324426782117\n",
        "PROMPT          = \"a cute, tabby cat\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "```\n",
        "\n",
        "**Followed by**:\n",
        "\n",
        "```py\n",
        "SEED            = 129324426782117\n",
        "PROMPT          = \"a cute, tabby cat\"\n",
        "NEGATIVE_PROMPT = \"not blurry\"\n",
        "```\n",
        "\n",
        "How does the negative prompt change the image?\n",
        "\n",
        "\n",
        "**Try**:\n",
        "\n",
        "```py\n",
        "SEED            = 4009664425088305\n",
        "PROMPT          = \"a cute, tabby cat\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "```\n",
        "\n",
        "Hmmm. We asked for one cat, but got two. They're cute, but they aren't what we asked for. Let's see if we can solve that with a negative prompt.\n",
        "\n",
        "**Try**:\n",
        "\n",
        "```py\n",
        "SEED            = 4009664425088305\n",
        "PROMPT          = \"a cute, tabby cat\"\n",
        "NEGATIVE_PROMPT = \"two cats, multiple cats\"\n",
        "```\n",
        "\n",
        "Was this as effective as \"not blurry\" was in the previous attempts? A seed can be stubborn. How might we use that to our advantage?\n",
        "\n",
        "**Try**:\n",
        "\n",
        "```py\n",
        "SEED            = 4009664425088305\n",
        "PROMPT          = \"a cute, australian sheppherd\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "```\n",
        "\n",
        "Now there are even more animals! This seed likes groups. What's happening with the nose on the dog on the left?\n",
        "\n",
        "**Let's see if we can fix that**:\n",
        "\n",
        "```py\n",
        "SEED            = 4009664425088305\n",
        "PROMPT          = \"a cute, australian sheppherd\"\n",
        "NEGATIVE_PROMPT = \"extra noses, mangled faces\"\n",
        "```\n",
        "\n",
        "awwwww"
      ],
      "metadata": {
        "id": "Pf_anwgHIBRX"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Demonstrate What You Learned\n",
        "\n",
        "Use the following cell to generate an image that you will refine. Once you find the image you will refine, move to the next code cell and leave the image in this cell, so I can see it when you share this notebook with me.\n",
        "\n",
        "Consider adding notes / text cells to share any context you think is worth sharing or remembering."
      ],
      "metadata": {
        "id": "7wPAZq0QNINE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "WIDTH           = 896\n",
        "HEIGHT          = 640\n",
        "STEPS           = 10 # try integers between 10 and 150\n",
        "DEVICE          = \"cuda\"\n",
        " # use `None` as the value for SEED to generate a new seed\n",
        "SEED            = None\n",
        "PROMPT          = \"\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "GUIDANCE_SCALE  = 7.5\n",
        "PIPELINE        = dp2_pipeline # dp2_pipeline | sdxl_pipeline\n",
        "\n",
        "[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
        "  PROMPT,\n",
        "  negative_prompt=NEGATIVE_PROMPT,\n",
        "  num_inference_steps=STEPS,\n",
        "  width=WIDTH,\n",
        "  height=HEIGHT,\n",
        "  guidance_scale=GUIDANCE_SCALE\n",
        ")\n",
        "\n",
        "print(\"\")\n",
        "print(f\"PROMPT: {PROMPT}\")\n",
        "print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
        "print(f\"STEPS: {STEPS}\")\n",
        "print(f\"WIDTH: {WIDTH}\")\n",
        "print(f\"HEIGHT: {HEIGHT}\")\n",
        "print(f\"DEVICE: {DEVICE}\")\n",
        "print(f\"MODEL_ID: {model_id}\")\n",
        "print(f\"SEED: {seed}\")\n",
        "print(\"\")\n",
        "result.images[0]"
      ],
      "metadata": {
        "id": "3RHCLvPbNS5V"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Refine with Prompts\n",
        "\n",
        "Using just the prompt, refine the image you generatedd above. Remember to set the seed. You can also change the number of steps and the guidance scale here, if you wish (the guidance scale instructs the model how much to follow your prompt)."
      ],
      "metadata": {
        "id": "1obRh7ebNtbL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "WIDTH           = 896\n",
        "HEIGHT          = 640\n",
        "STEPS           = 10 # try integers between 10 and 150\n",
        "DEVICE          = \"cuda\"\n",
        " # use `None` as the value for SEED to generate a new seed\n",
        "SEED            = None\n",
        "PROMPT          = \"\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "GUIDANCE_SCALE  = 7.5\n",
        "PIPELINE        = dp2_pipeline # dp2_pipeline | sdxl_pipeline\n",
        "\n",
        "[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
        "  PROMPT,\n",
        "  negative_prompt=NEGATIVE_PROMPT,\n",
        "  num_inference_steps=STEPS,\n",
        "  width=WIDTH,\n",
        "  height=HEIGHT,\n",
        "  guidance_scale=GUIDANCE_SCALE\n",
        ")\n",
        "\n",
        "print(\"\")\n",
        "print(f\"PROMPT: {PROMPT}\")\n",
        "print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
        "print(f\"STEPS: {STEPS}\")\n",
        "print(f\"WIDTH: {WIDTH}\")\n",
        "print(f\"HEIGHT: {HEIGHT}\")\n",
        "print(f\"DEVICE: {DEVICE}\")\n",
        "print(f\"MODEL_ID: {model_id}\")\n",
        "print(f\"SEED: {seed}\")\n",
        "print(\"\")\n",
        "result.images[0]"
      ],
      "metadata": {
        "id": "Mm8-qdYCNxmU"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Refine with Negative Prompts\n",
        "\n",
        "Using the negative prompt, further refine the image you generated above."
      ],
      "metadata": {
        "id": "QsgUABMKOPzL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "WIDTH           = 896\n",
        "HEIGHT          = 640\n",
        "STEPS           = 10 # try integers between 10 and 150\n",
        "DEVICE          = \"cuda\"\n",
        " # use `None` as the value for SEED to generate a new seed\n",
        "SEED            = None\n",
        "PROMPT          = \"\"\n",
        "NEGATIVE_PROMPT = \"\"\n",
        "GUIDANCE_SCALE  = 7.5\n",
        "PIPELINE        = dp2_pipeline # dp2_pipeline | sdxl_pipeline\n",
        "\n",
        "[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
        "  PROMPT,\n",
        "  negative_prompt=NEGATIVE_PROMPT,\n",
        "  num_inference_steps=STEPS,\n",
        "  width=WIDTH,\n",
        "  height=HEIGHT,\n",
        "  guidance_scale=GUIDANCE_SCALE\n",
        ")\n",
        "\n",
        "print(\"\")\n",
        "print(f\"PROMPT: {PROMPT}\")\n",
        "print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
        "print(f\"STEPS: {STEPS}\")\n",
        "print(f\"WIDTH: {WIDTH}\")\n",
        "print(f\"HEIGHT: {HEIGHT}\")\n",
        "print(f\"DEVICE: {DEVICE}\")\n",
        "print(f\"MODEL_ID: {model_id}\")\n",
        "print(f\"SEED: {seed}\")\n",
        "print(\"\")\n",
        "result.images[0]"
      ],
      "metadata": {
        "id": "b6WMeQRROT6Q"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": [],
	"collapsed_sections": [
	"1T1MdvXN0WFG",
	"qzuxtUNb3gXR",
	"jYNXZJE55MWv",
	"EyVSupKzVbHK",
	"1qowQfIC6mqn",
	"7wPAZq0QNINE"
	],
	"gpuType": "T4",
	"authorship_tag": "ABX9TyP3bF7K8VQ55w7nPgdDfVYt",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	},
	"accelerator": "GPU"
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/losandes/b688b2b7473733d3f9f4cb0c1002e1db/prompt-engineering-with-diffusion-models.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"# Prompt Engineering with Diffusion Models (Text to Image Generation)\n",
	"\n",
	"This notebook provides tooling to explore prompt engineering: creating precise and informative instructions to acquire desired outputs from Diffusion models, using Hugging Face Diffusion Pipelines.\n",
	"\n",
	"This notebook is a compliment to [Diffusion Bench](https://github.com/losandes/diffusion-bench), is simpler than Diffusion Bench (this runs in the cloud and doesn't run batches), and is easy to modify. The README for Diffusion Bench provides examples that compliment this notebook, so if you're interested in this notebook, you should check that out too.\n",
	"\n",
	"If you aren't familiar with Hugging Face, text-to-image generation, or Diffusion models, the [Hugging Face NLP Course](https://huggingface.co/learn/nlp-course/chapter0/1) is a helpful introduction to what is explored here.\n",
	"\n",
	"_Note that GPUs are a limited resource, both in what is allocated to you and how long you can use them. Google doesn't provide much insight as to the limitations or when you are nearing your limits. I actually ran out of resources writing this notebook!_"
	],
	"metadata": {
	"id": "CDUjsVKpy7Hm"
	}
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Setup\n",
	"\n",
	"IMPORTANT: You need to run the Code in each section, in the order that it appears, each time you connect to a runtime.\n",
	"\n",
	"To get started with this notebook, click \"File\", and \"Save a copy\" to a location of your choosing.\n",
	"\n",
	"This is an Image to Text notebook, so choose the T4 GPU (top right corner, click the down arrow next to RAM and Disk, choose \"change runtime type\", and choose T4 GPU).\n",
	"\n",
	"Expect different results from different processors. So if you choose something other than a T4 GPU, or if that is no longer available, the storylines at the bottom (try this... then that) may not make sense.\n",
	"\n",
	"The cell in this section installs the python packages that are used in this notebook."
	],
	"metadata": {
	"id": "1T1MdvXN0WFG"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"!pip install transformers[sentencepiece]\n",
	"!pip install diffusers --upgrade\n",
	"!pip install invisible_watermark accelerate safetensors ipyplot torch scipy"
	],
	"metadata": {
	"id": "_bENZzJe0ZKt"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Predictable Tensor Generation\n",
	"\n",
	"To explore the effects of changing prompts and variables, we need to be able to repeat results.\n",
	"\n",
	"This notebook uses [torch.randn](https://pytorch.org/docs/stable/generated/torch.randn.html) to generate a tensor filled with random numbers. It uses [torch.Generator](https://pytorch.org/docs/stable/generated/torch.Generator.html) to manage random number generation so that we can produce predictable tensors when providing a manual seed.[^1]\n",
	"\n",
	"These tensors will then be passed as an argument to the Diffusion Pipeline.\n",
	"\n",
	"[^1]: You can read more about generating tensors in [Using torch.randn and torch.randn_like to create Random Tensors in PyTorch](https://machinelearningknowledge.ai/using-torch-randn-and-torch-randn_like-to-create-random-tensors-in-pytorch/)"
	],
	"metadata": {
	"id": "qzuxtUNb3gXR"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"import torch\n",
	"\n",
	"NONE = None\n",
	"\n",
	"def make_latents (\n",
	" device,\n",
	" in_channels,\n",
	" width=768,\n",
	" height=768,\n",
	" seed=NONE,\n",
	"):\n",
	" \"\"\"\n",
	" @see https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb\n",
	" \"\"\"\n",
	" # print(f\"device: {device}\")\n",
	" # print(f\"in_channels: {in_channels}\")\n",
	" # print(f\"width: {width}\")\n",
	" # print(f\"height: {height}\")\n",
	" # print(f\"seed: {seed}\")\n",
	" # print(f\"together: {(1, in_channels, height // 8, width // 8)}\")\n",
	"\n",
	" generator = torch.Generator(device=device)\n",
	" seed = seed if seed is not None else generator.seed()\n",
	" generator = generator.manual_seed(seed)\n",
	"\n",
	" return [\n",
	" seed,\n",
	" torch.randn(\n",
	" (1, in_channels, height // 8, width // 8),\n",
	" generator = generator,\n",
	" device = device,\n",
	" dtype=torch.float16,\n",
	" ),\n",
	" ]\n",
	"\n",
	" # latents should have shape like (4, 4, 64, 64)\n",
	" # latents.shape\n"
	],
	"metadata": {
	"id": "bROtNMs34lU2"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Diffusion Pipelines\n",
	"\n",
	"The image text-to-image generation is performed by Hugging Face Pipelines. The code in this section configures the pipelines that we'll use for image generation.\n",
	"\n",
	"Two models are provided:\n",
	"\n",
	"- [Dreamlike Photoreal 2](https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0)\n",
	"- [Stable Diffusion XL Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)\n",
	"\n",
	"Try adding another! (e.g. [Analog Diffusion](https://huggingface.co/wavymulder/Analog-Diffusion) or [Openjourney](https://huggingface.co/prompthero/openjourney))\n",
	"\n",
	"_Note that, when adding a new pipeline, it's important to keep the same dtype (torch_dtype=torch.float16)_"
	],
	"metadata": {
	"id": "jYNXZJE55MWv"
	}
	},
	{
	"cell_type": "markdown",
	"source": [
	"### Dreamlike Photoreal 2\n",
	"\n",
	"Create a pipeline that uses [Dreamlike Photoreal 2](https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0)"
	],
	"metadata": {
	"id": "byXY1iSvnF0k"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"from diffusers import StableDiffusionPipeline\n",
	"import torch\n",
	"\n",
	"def dp2_pipeline (device):\n",
	" \"\"\"\n",
	" @see https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0\n",
	"\n",
	" This model was trained on 768x768px images, so use:\n",
	" - 768x768px\n",
	" - 640x896px\n",
	" - 896x640px\n",
	" - 768x1024px\n",
	" - 1024x768px\n",
	" \"\"\"\n",
	" model_id = \"dreamlike-art/dreamlike-photoreal-2.0\"\n",
	" pipe = StableDiffusionPipeline.from_pretrained(\n",
	" model_id,\n",
	" use_safetensors=True,\n",
	" torch_dtype=torch.float16,\n",
	" ).to(device)\n",
	"\n",
	" return [model_id, pipe]\n"
	],
	"metadata": {
	"id": "TAU9V7Je5XET"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"### Stable Diffusion XL\n",
	"\n",
	"Create a pipeline that uses [Stable Diffusion XL Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)"
	],
	"metadata": {
	"id": "8xrQaSxvnPtX"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"from diffusers import DiffusionPipeline\n",
	"\n",
	"def sdxl_pipeline (device):\n",
	" \"\"\"\n",
	" @see https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0\n",
	" \"\"\"\n",
	" model_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\n",
	" pipe = DiffusionPipeline.from_pretrained(\n",
	" model_id,\n",
	" use_safetensors=True,\n",
	" torch_dtype=torch.float16,\n",
	" ).to(device)\n",
	"\n",
	" return [model_id, pipe]\n"
	],
	"metadata": {
	"id": "N0GrETSBncRZ"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Text to Image Generator\n",
	"\n",
	"The code in this section composes the latent generation and pipeline execution to produce an image.\n",
	"\n"
	],
	"metadata": {
	"id": "EyVSupKzVbHK"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"def txt2img (pipeline, device, manual_seed):\n",
	" \"\"\"\n",
	" Produces a function that generates an image from text\n",
	"\n",
	" Returns: (prompt, width, height, manual_seed, **kwargs) => Image\n",
	" \"\"\"\n",
	"\n",
	" def _txt2img (prompt, **kwargs):\n",
	" \"\"\"\n",
	" Generates an image from text\n",
	"\n",
	" Returns: Image\n",
	" \"\"\"\n",
	"\n",
	" if not 'width' in kwargs or kwargs['width'] is None:\n",
	" print(f\"expected int, actual width: {kwargs['width']}\")\n",
	" raise Exception(\"width is required\")\n",
	"\n",
	" if not 'height' in kwargs or kwargs['height'] is None:\n",
	" print(f\"expected int, actual height: {kwargs['height']}\")\n",
	" raise Exception(\"height is required\")\n",
	"\n",
	" [model_id, pipe] = pipeline(device)\n",
	" [seed, latents] = make_latents(\n",
	" device,\n",
	" pipe.unet.config.in_channels,\n",
	" width=kwargs['width'],\n",
	" height=kwargs['height'],\n",
	" seed=manual_seed,\n",
	" )\n",
	"\n",
	" result = pipe(\n",
	" prompt,\n",
	" latents=latents,\n",
	" **kwargs\n",
	" )\n",
	"\n",
	" return [result, model_id, seed]\n",
	" return _txt2img"
	],
	"metadata": {
	"id": "DzjGwLpuVfVZ"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Practice Prompt Engineering\n",
	"\n",
	"Following is the code you'll work with to explore prompt engineering with diffusion models. Look below the code for suggestions on how to get started."
	],
	"metadata": {
	"id": "1qowQfIC6mqn"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"WIDTH = 896\n",
	"HEIGHT = 640\n",
	"STEPS = 10 # try integers between 10 and 150\n",
	"DEVICE = \"cuda\"\n",
	" # use `None` as the value for SEED to generate a new seed\n",
	"SEED = 6886045143854674\n",
	"PROMPT = \"a painting of a woman talking with a robot\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"GUIDANCE_SCALE = 7.5\n",
	"PIPELINE = dp2_pipeline # dp2_pipeline \| sdxl_pipeline\n",
	"\n",
	"[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
	" PROMPT,\n",
	" negative_prompt=NEGATIVE_PROMPT,\n",
	" num_inference_steps=STEPS,\n",
	" width=WIDTH,\n",
	" height=HEIGHT,\n",
	" guidance_scale=GUIDANCE_SCALE\n",
	")\n",
	"\n",
	"print(\"\")\n",
	"print(f\"PROMPT: {PROMPT}\")\n",
	"print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
	"print(f\"STEPS: {STEPS}\")\n",
	"print(f\"WIDTH: {WIDTH}\")\n",
	"print(f\"HEIGHT: {HEIGHT}\")\n",
	"print(f\"DEVICE: {DEVICE}\")\n",
	"print(f\"MODEL_ID: {model_id}\")\n",
	"print(f\"SEED: {seed}\")\n",
	"print(\"\")\n",
	"result.images[0]"
	],
	"metadata": {
	"id": "5h-H58or6zRY"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"### Refine with Prompts\n",
	"\n",
	"How does changing a prompt change the image?\n",
	"\n",
	"Try:\n",
	"\n",
	"```py\n",
	"STEPS = 10\n",
	"SEED = 6886045143854674\n",
	"PROMPT = \"a painting of a woman talking with a robot\"\n",
	"```\n",
	"\n",
	"Fun... mangled hands though. Hands are difficult. We could try to remove them with negative prompts, but let's keep working on the prompt for now.\n",
	"\n",
	"Add an artist, and change the number of steps to 20:\n",
	"\n",
	"```py\n",
	"STEPS = 20\n",
	"SEED = 6886045143854674\n",
	"PROMPT = \"a painting of a woman talking with a robot, by Alphonse Mucha\"\n",
	"```\n",
	"\n",
	"In this example, adding the artist produces a significant change. Let's instruct it to change the style:\n",
	"\n",
	"```py\n",
	"STEPS = 20\n",
	"SEED = 6886045143854674\n",
	"PROMPT = \"a painting of a woman talking with a robot, intricate, elegant, trending on artstation, masterpiece, by Alphonse Mucha\"\n",
	"```\n",
	"\n",
	"What happens if we change the artist now?\n",
	"\n",
	"```py\n",
	"STEPS = 20\n",
	"SEED = 6886045143854674\n",
	"PROMPT = \"a painting of a woman talking with a robot, intricate, elegant, trending on artstation, masterpiece, by Sam Spratt\"\n",
	"```\n",
	"\n",
	"\n"
	],
	"metadata": {
	"id": "dS5gy-q0HlJQ"
	}
	},
	{
	"cell_type": "markdown",
	"source": [
	"### Refine with Negative Prompts\n",
	"\n",
	"How does a negative prompt change the image?\n",
	"\n",
	"Try:\n",
	"\n",
	"```py\n",
	"SEED = 129324426782117\n",
	"PROMPT = \"a cute, tabby cat\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"```\n",
	"\n",
	"Followed by:\n",
	"\n",
	"```py\n",
	"SEED = 129324426782117\n",
	"PROMPT = \"a cute, tabby cat\"\n",
	"NEGATIVE_PROMPT = \"not blurry\"\n",
	"```\n",
	"\n",
	"How does the negative prompt change the image?\n",
	"\n",
	"\n",
	"Try:\n",
	"\n",
	"```py\n",
	"SEED = 4009664425088305\n",
	"PROMPT = \"a cute, tabby cat\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"```\n",
	"\n",
	"Hmmm. We asked for one cat, but got two. They're cute, but they aren't what we asked for. Let's see if we can solve that with a negative prompt.\n",
	"\n",
	"Try:\n",
	"\n",
	"```py\n",
	"SEED = 4009664425088305\n",
	"PROMPT = \"a cute, tabby cat\"\n",
	"NEGATIVE_PROMPT = \"two cats, multiple cats\"\n",
	"```\n",
	"\n",
	"Was this as effective as \"not blurry\" was in the previous attempts? A seed can be stubborn. How might we use that to our advantage?\n",
	"\n",
	"Try:\n",
	"\n",
	"```py\n",
	"SEED = 4009664425088305\n",
	"PROMPT = \"a cute, australian sheppherd\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"```\n",
	"\n",
	"Now there are even more animals! This seed likes groups. What's happening with the nose on the dog on the left?\n",
	"\n",
	"Let's see if we can fix that:\n",
	"\n",
	"```py\n",
	"SEED = 4009664425088305\n",
	"PROMPT = \"a cute, australian sheppherd\"\n",
	"NEGATIVE_PROMPT = \"extra noses, mangled faces\"\n",
	"```\n",
	"\n",
	"awwwww"
	],
	"metadata": {
	"id": "Pf_anwgHIBRX"
	}
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Demonstrate What You Learned\n",
	"\n",
	"Use the following cell to generate an image that you will refine. Once you find the image you will refine, move to the next code cell and leave the image in this cell, so I can see it when you share this notebook with me.\n",
	"\n",
	"Consider adding notes / text cells to share any context you think is worth sharing or remembering."
	],
	"metadata": {
	"id": "7wPAZq0QNINE"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"WIDTH = 896\n",
	"HEIGHT = 640\n",
	"STEPS = 10 # try integers between 10 and 150\n",
	"DEVICE = \"cuda\"\n",
	" # use `None` as the value for SEED to generate a new seed\n",
	"SEED = None\n",
	"PROMPT = \"\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"GUIDANCE_SCALE = 7.5\n",
	"PIPELINE = dp2_pipeline # dp2_pipeline \| sdxl_pipeline\n",
	"\n",
	"[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
	" PROMPT,\n",
	" negative_prompt=NEGATIVE_PROMPT,\n",
	" num_inference_steps=STEPS,\n",
	" width=WIDTH,\n",
	" height=HEIGHT,\n",
	" guidance_scale=GUIDANCE_SCALE\n",
	")\n",
	"\n",
	"print(\"\")\n",
	"print(f\"PROMPT: {PROMPT}\")\n",
	"print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
	"print(f\"STEPS: {STEPS}\")\n",
	"print(f\"WIDTH: {WIDTH}\")\n",
	"print(f\"HEIGHT: {HEIGHT}\")\n",
	"print(f\"DEVICE: {DEVICE}\")\n",
	"print(f\"MODEL_ID: {model_id}\")\n",
	"print(f\"SEED: {seed}\")\n",
	"print(\"\")\n",
	"result.images[0]"
	],
	"metadata": {
	"id": "3RHCLvPbNS5V"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"### Refine with Prompts\n",
	"\n",
	"Using just the prompt, refine the image you generatedd above. Remember to set the seed. You can also change the number of steps and the guidance scale here, if you wish (the guidance scale instructs the model how much to follow your prompt)."
	],
	"metadata": {
	"id": "1obRh7ebNtbL"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"WIDTH = 896\n",
	"HEIGHT = 640\n",
	"STEPS = 10 # try integers between 10 and 150\n",
	"DEVICE = \"cuda\"\n",
	" # use `None` as the value for SEED to generate a new seed\n",
	"SEED = None\n",
	"PROMPT = \"\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"GUIDANCE_SCALE = 7.5\n",
	"PIPELINE = dp2_pipeline # dp2_pipeline \| sdxl_pipeline\n",
	"\n",
	"[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
	" PROMPT,\n",
	" negative_prompt=NEGATIVE_PROMPT,\n",
	" num_inference_steps=STEPS,\n",
	" width=WIDTH,\n",
	" height=HEIGHT,\n",
	" guidance_scale=GUIDANCE_SCALE\n",
	")\n",
	"\n",
	"print(\"\")\n",
	"print(f\"PROMPT: {PROMPT}\")\n",
	"print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
	"print(f\"STEPS: {STEPS}\")\n",
	"print(f\"WIDTH: {WIDTH}\")\n",
	"print(f\"HEIGHT: {HEIGHT}\")\n",
	"print(f\"DEVICE: {DEVICE}\")\n",
	"print(f\"MODEL_ID: {model_id}\")\n",
	"print(f\"SEED: {seed}\")\n",
	"print(\"\")\n",
	"result.images[0]"
	],
	"metadata": {
	"id": "Mm8-qdYCNxmU"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"### Refine with Negative Prompts\n",
	"\n",
	"Using the negative prompt, further refine the image you generated above."
	],
	"metadata": {
	"id": "QsgUABMKOPzL"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"WIDTH = 896\n",
	"HEIGHT = 640\n",
	"STEPS = 10 # try integers between 10 and 150\n",
	"DEVICE = \"cuda\"\n",
	" # use `None` as the value for SEED to generate a new seed\n",
	"SEED = None\n",
	"PROMPT = \"\"\n",
	"NEGATIVE_PROMPT = \"\"\n",
	"GUIDANCE_SCALE = 7.5\n",
	"PIPELINE = dp2_pipeline # dp2_pipeline \| sdxl_pipeline\n",
	"\n",
	"[result, model_id, seed] = txt2img(PIPELINE, DEVICE, SEED)(\n",
	" PROMPT,\n",
	" negative_prompt=NEGATIVE_PROMPT,\n",
	" num_inference_steps=STEPS,\n",
	" width=WIDTH,\n",
	" height=HEIGHT,\n",
	" guidance_scale=GUIDANCE_SCALE\n",
	")\n",
	"\n",
	"print(\"\")\n",
	"print(f\"PROMPT: {PROMPT}\")\n",
	"print(f\"NEGATIVE_PROMPT: {NEGATIVE_PROMPT}\")\n",
	"print(f\"STEPS: {STEPS}\")\n",
	"print(f\"WIDTH: {WIDTH}\")\n",
	"print(f\"HEIGHT: {HEIGHT}\")\n",
	"print(f\"DEVICE: {DEVICE}\")\n",
	"print(f\"MODEL_ID: {model_id}\")\n",
	"print(f\"SEED: {seed}\")\n",
	"print(\"\")\n",
	"result.images[0]"
	],
	"metadata": {
	"id": "b6WMeQRROT6Q"
	},
	"execution_count": null,
	"outputs": []
	}
	]
	}