Skip to content

Instantly share code, notes, and snippets.

@anotherjesse
Created October 11, 2022 00:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anotherjesse/080e3104f2324484c9158c4e27025789 to your computer and use it in GitHub Desktop.
Save anotherjesse/080e3104f2324484c9158c4e27025789 to your computer and use it in GitHub Desktop.
running clip to get prompt embedding for stable diffusion
!pip install diffusers==0.4.0
!pip install transformers ftfy
from transformers import CLIPTokenizer, CLIPTextModel
import torch
torch_device = "cuda" if torch.cuda.is_available() else "cpu" # or just let it be cpu
prompt = ["a photograph of an astronaut riding a horse"]
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
text_input = tokenizer(
prompt,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
text_input = tokenizer(
prompt,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
## text_input is a input_ids: tensor, attention_mask: tensor
## Q: is text_input input_ids always the same shape? (1, 77)
with torch.no_grad():
text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]
# text embeddings is a tensor of shape torch.Size([1, 77, 768])
max_length = text_input.input_ids.shape[-1]
batch_size = 1
uncond_input = tokenizer(
[""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt"
)
with torch.no_grad():
uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0]
# uncond_embeddings is torch.Size([1, 77, 768])
text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
# text_embeddings.shape => torch.Size([2, 77, 768])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment