Skip to content

Instantly share code, notes, and snippets.

Last active November 28, 2023 16:48
  • Star 147 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
You probably don't know how to do Prompt Engineering, let me educate you.

You probably don't know how to do Prompt Engineering

(This post could also be titled "Features missing from most LLM front-ends that should exist")

Apologies for the snarky title, but there has been a huge amount of discussion around so called "Prompt Engineering" these past few months on all kinds of platforms. Much of it is coming from individuals who are peddling around an awful lot of "Prompting" and very little "Engineering".

Most of these discussions are little more than users finding that writing more creative and complicated prompts can help them solve a task that a more simple prompt was unable to help with. I claim this is not Prompt Engineering. This is not to say that crafting good prompts is not a difficult task, but it does not involve doing any kind of sophisticated modifications to general "template" of a prompt.

Others, who I think do deserve to call themselves "Prompt Engineers" (and an awful lot more than that), have been writing about and utilizing the rich new eco-system of tooling around LLMs for features such as templates, additional memory, and custom decoders. Examples of these include Langchain, VectorDB technologies, txtai/txtchat, my own work on token level constrained text generation, huggingfaces work on sequence level constrained text generation, and many others. Many of these additional tools are finding that they can form entire, well funded companies around their tool. Despite the money and hype around LLMs, it is still shockingly difficult to prompt them. but this doesn't have to be the case!

Thank you Stable Diffusion Community for showing us in NLP the way

We are fortunate that an awful lot of very smart people have implemented many really neat prompt engineering techniques within Stable Diffusion and more specifically within the Automatic1111 webui. I am going to highlight some of these packages/techniques, and this is important because I will carefully explain and demonstrate conclusively that there are LLM analogies for these techniques which are being unjustly forgotten about/not implemented by any LLM front-end. Some naysayers seem to think this is not the case. My hope is that this gist puts the final nail in the coffin of our current non-creative approach to prompting LLMs. Let's list out the techniques that they've pioneered and which are broadly possible for us to use in NLP, that we have not implemented to my knowledge in any serious capacity in any repo.

  1. Prompt Alternating (Implemented)

We can implement prompt alternating by alternating the previous input prompts between the user given two prompts

Imagine that you want to get 20 tokens of output with prompt alternating between two prompts: "I like apple" and "I like bananas", making the input prompt look like this: [I like apple:I like bananas]

For the first token generated, the input is "I like apple", let's say it generates "because" For the second token generated, the input is "I like bananas because", let's say it generates "I" for the third token generated, the input is "I like apple because I" and it generates... and so on...

  1. Prompt Editing

Similar to above but we can choose for how many tokens ahead that we do the above for before switching to another

  1. Prompt Weighting (Implemented)

In a LLM front end, I should be able to use "()" in the prompt to increase the model's attention to enclosed words, and [] to decrease it. I should be able to combine multiple modifiers

  1. Prompt Blending (Implemented)

I should be able to average or compute the weighted average of the embeddings of multiple tokens in a prompt. This enables us to get a model to answer the question "What is the definition of {apple|orange}" where {apple|orange} is the mathematical average in embedding space of those two words.

  1. Prompt Fusion (Advanced Prompt Blending)

Prompt Blending but with far more flexibility, use a third point as an "anchor"

This gist will include extremely basic LLM implementations for these 5 techniques, written by yours truly, all contributed with the hope that this spurs the community at large to implement and experiment with these features. Two are given below but I will finish the other 3 in the next few days as my time allows.

### Implementation of Prompt Alternating for LLMs
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
def prompt_alternating(prompt, insert_position, alternate_prompts, num_tokens):
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
prompt_tokens = tokenizer.encode(prompt, return_tensors="pt")
output_tokens = prompt_tokens.clone()
for _ in range(num_tokens):
alternate_index = len(output_tokens[0]) % len(alternate_prompts)
alternate = alternate_prompts[alternate_index]
alternate_tokens = tokenizer.encode(alternate, return_tensors="pt")
print(prompt_tokens[:, :insert_position])
print(output_tokens[:, insert_position:])
input_ids =[:, :insert_position], alternate_tokens, output_tokens[:, insert_position:]), dim=-1)
next_token = model.generate(input_ids, max_length=input_ids.shape[1] + 1, do_sample = True)[:, -1].unsqueeze(0)
output_tokens =, next_token), dim=-1)
generated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
return generated_text
prompt = "This is a apple"
insert_position = 3
alternate_prompts = ["blue", "red", "yellow"]
num_tokens = 100
result = prompt_alternating(prompt, insert_position, alternate_prompts, num_tokens)
### Implementing Automatic1111 style attention weights
### Note, GPT2 is very tempermental with this technique, seems to need a high temperature for even close to coherent output
import re
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
def modify_attention_mask(prompt, model, tokenizer):
tokens = []
attention_modifiers = []
add_space = False
for token in re.split(r'\(|\)', prompt):
if ':' in token:
word, modifier = token.split(':')
modifier = float(modifier.strip())
word = token.strip()
modifier = 1.0
current_tokens = tokenizer.tokenize(word)
if add_space and current_tokens:
tokens.append('ฤ ') # Space token for GPT-2
attention_modifiers.extend([modifier] * len(current_tokens))
add_space = True
attention_mask = torch.tensor([attention_modifiers])
input_ids = torch.tensor([tokenizer.convert_tokens_to_ids(tokens)])
return input_ids, attention_mask
def custom_generate(prompt, model, tokenizer, **kwargs):
input_ids, attention_mask = modify_attention_mask(prompt, model, tokenizer)
# Set the modified attention mask
model.config.attention_probs_dropout_prob = 0.0
with torch.no_grad():
output_sequences = model.generate(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
return tokenizer.decode(output_sequences[0], skip_special_tokens=True)
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
prompt = "The (large house:1.0001) was situated on a hill. The buildings were made in an enormous block by the three towers of the four houses, with high ceilings of over one hundred and eight inches. They were built with stones and wood and all are from small scale timber."
generated_text = custom_generate(prompt, model, tokenizer, do_sample = True, temperature = 20.0, max_length=200)
### Implementing of Prompt Blending for a LLM
import torch
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained('gpt2-xl')
model = AutoModelWithLMHead.from_pretrained('gpt2-xl', device_map='auto')
# Tokenize the entire prompt
prompt = "I am eating today "
input_ids = tokenizer.encode(prompt, return_tensors='pt')
# Get the embeddings for the entire prompt
all_embeddings = model.transformer.wte(input_ids)
# List of sequences to average
sequences = ["delicious chow mein", "delicious ice cream", "tasty pizza"]
# List of weights for each sequence
weights = [0.6, 0.3, 0.1]
assert len(sequences) == len(weights), "Weights and sequences must have the same length."
# Tokenize and retrieve the embeddings for the sequences
sequence_embeddings = []
for seq in sequences:
input_ids_seq = tokenizer.encode(seq, return_tensors='pt')
embeddings_seq = model.transformer.wte(input_ids_seq)
# Calculate the weighted average embeddings for the desired sequences
weights_tensor = torch.tensor(weights).view(-1, 1, 1).to(all_embeddings.device)
weighted_embeddings = torch.stack(sequence_embeddings, dim=0) * weights_tensor
average_embedding = weighted_embeddings.sum(dim=0)
# Insert position for the averaged embeddings in the prompt
insert_position = 3
# Concatenate the averaged embeddings with the prompt embeddings at the specified position
modified_embeddings =[all_embeddings[:, :insert_position], average_embedding.unsqueeze(1), all_embeddings[:, insert_position:]], dim=1)
# Use the modified embeddings as input
output = model.generate(inputs_embeds=modified_embeddings, do_sample=True, max_length=100)
decoded_output = tokenizer.decode(output[0])
Copy link

Great work

Copy link

๐Ÿ“š๐Ÿ’ก Inspiring View on Prompt Engineering

Your article presents a compelling view on prompt engineering, challenging the prevailing paradigm and championing a more nuanced and sophisticated approach to the use of large language models. ๐Ÿ‘

โš ๏ธ๐Ÿ’ผ Considering Challenges and Practical Applications

However, while your argument for advanced techniques is persuasive, it's crucial not to overlook potential challenges and drawbacks that might accompany these methods. ๐Ÿค” As we venture into this uncharted territory, a balanced view of the landscape is necessary. Furthermore, a deeper dive into practical applications of these techniques could help others grasp the tangible benefits and understand their implications better. This would provide a more holistic perspective. ๐ŸŒ

๐Ÿ‘๐Ÿš€ Unflinching Analysis and Advocacy for Innovation

Nonetheless, your unflinching analysis and advocacy for innovation within the field is praiseworthy. ๐Ÿ‘๐ŸŽฏ Your work serves as a thought-provoking call to action for others to reconsider their approach to prompt engineering, fostering a much-needed dialogue that could drive the field forward. I look forward to seeing the ripple effects of your perspectives in the future evolution of large language models. ๐Ÿ”„๐Ÿ”ฎ


๐Ÿค–๐Ÿ’ญ Implications for LLM Training and Computation

How might the implementation of these advanced prompt engineering techniques affect the training and computational requirements of LLMs?

๐ŸŒ๐Ÿ’ป Adoption in LLM Front-ends

Given the potential of these advanced techniques, why have they not been widely adopted or implemented in LLM front-ends?

โš ๏ธ๐Ÿš€ Possible Challenges and Complications

Could the adoption of these techniques lead to unforeseen challenges or complications in the use of LLMs?

Copy link

@JJC-code Your response looks like it was autogenerated by some kind of LLM. Glad to see that you found this to be good input to play around with using some LLM service.

This blog post really isn't long enough to be worth the effort. I think it will be far more interesting to ask sufficiently advanced LLMs about other kinds of techniques for further modifying how they output things. Hopefully they'd give some interesting ideas, and possibly interesting and high quality code

Copy link

@Hellisotherpeople - Yes, in part it is, because I am working on a solution that is based on LLM and I have used it for this purpose, but the questions are mainly from me, because I also realized a mania for talking about prompting as a form of only making creative queries, and your approach was a little different, which I liked.

Copy link

@Hellisotherpeople great short post! (Like many recent papers on prompt engineering could be haha)

I agree the tooling space for prompt engineering could be improved to allow more intricate techniques.

Have you seen Guidance by Microsoft?

As the name suggests itโ€™s kind of like guardrails for LLMs.

Looks to implement a few similar ideas (not entirely the same) to what youโ€™re talking about:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment