Skip to content

Instantly share code, notes, and snippets.

@carry0987
Forked from primaryobjects/data.py
Created July 5, 2023 07:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save carry0987/9b1ff6be9c6f8d63157dddc568c7881a to your computer and use it in GitHub Desktop.
Save carry0987/9b1ff6be9c6f8d63157dddc568c7881a to your computer and use it in GitHub Desktop.
LLM ChatGPT prompt engineering accuracy statistics. Using F-score and accuracy to measure effectiveness of prompts for classification. https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/1/introduction

Measuring Prompt Engineering Accuracy

Prompt engineering is a technique for achieving optimal degrees in accuracy from annotation and classification tasks on datasets using large language models (LLM), such as ChatGPT and OpenAI.

About

This project demonstrates an example of using different prompt styles in order to classify text as having a specific sentiment towards anger. Notice how depending on the formatting of the prompt supplied to the LLM, the resulting F-score and accuracy are affected.

The best performing prompt is achieved by specifying precicsly the format of the input data and the expected output response. The more detail that is provided clearly and concisely to the LLM, the more optimal the results from the LLM.

Statistics

Accuracy statistics are calculated using F-score and accuracy to measure effectiveness of prompts for classification.

Results

1. Is the sentence about anger? Respond only with the word "yes" or "no".

73%

predict('Is the sentence about anger? Respond only with the word "yes" or "no".')

[33, 1]
[14, 7]
(0.48275862068965514, 0.7272727272727273)

2. Determine if the following sentence is angry. Respond only with the word "yes" or "no".

87%

predict('Determine if the following sentence is angry. Respond only with the word "yes" or "no".')

[34, 0]
[7, 14]
0.8 0.8727272727272727

3. Determine if the following sentence enclosed with backticks contains a sentiment of anger. Respond only with the word "yes" or "no".

93%

predict('Determine if the following sentence enclosed with backticks contains a sentiment of anger. Respond only with the word "yes" or "no".')

[34, 0]
[4, 17]
(0.8947368421052632, 0.9272727272727272)

4. Determine if the following sentence enclosed with backticks contains a sentiment of anger, disgust, or negativity. Respond only with the word "yes" or "no".

93%

predict('Determine if the following sentence enclosed with backticks contains a sentiment of anger, disgust, or negativity. Respond only with the word "yes" or "no".')
print(result['f_score'], result['accuracy'])

[33, 1]
[3, 18]
0.9 0.9272727272727272

References

messages = [
{'content': 'I hate this food.', 'truth': 1},
{'content': 'This movie is terrible.', 'truth': 1},
{'content': 'I really like this play.', 'truth': 0},
{'content': 'I am just not sure.', 'truth': 0},
{'content': 'I can\'t believe you would do something like that!', 'truth': 1},
{'content': 'This is absolutely unacceptable!', 'truth': 1},
{'content': 'I am so angry right now, I can\'t even think straight.', 'truth': 1},
{'content': 'How could you be so thoughtless?', 'truth': 1},
{'content': 'You have no idea how much you\'ve hurt me.', 'truth': 1},
{'content': 'I\'m so frustrated I could scream.', 'truth': 1},
{'content': 'I\'ve never been so mad in my life!', 'truth': 1},
{'content': 'I can\'t even look at you right now.', 'truth': 1},
{'content': 'You\'ve really crossed the line this time.', 'truth': 1},
{'content': 'I won\'t stand for this kind of behavior.', 'truth': 1},
{'content': 'It\'s a beautiful day outside.', 'truth': 0},
{'content': 'I had a great time at the party last night.', 'truth': 0},
{'content': 'The flowers in the garden are blooming.', 'truth': 0},
{'content': 'I\'m looking forward to my vacation next week.', 'truth': 0},
{'content': 'The new restaurant in town has amazing food.', 'truth': 0},
{'content': 'I love spending time with my family and friends.', 'truth': 0},
{'content': 'The sunset was breathtaking.', 'truth': 0},
{'content': 'I always feel better after going for a run.', 'truth': 0},
{'content': 'The book I just finished reading was really good.', 'truth': 0},
{'content': 'I feel so relaxed after taking a hot bath.', 'truth': 0},
{'content': 'Good case. Excellent value.', 'truth': 0},
{'content': 'Great for the jawbone.', 'truth': 0},
{'content': 'Tied to charger for conversations lasting more than 45 minutes.MAJOR PROBLEMS!!', 'truth': 1},
{'content': 'The mic is great.', 'truth': 0},
{'content': 'I have to jiggle the plug to get it to line up right to get decent volume.', 'truth': 0},
{'content': 'If you have several dozen or several hundred contacts.', 'truth': 0},
{'content': 'then imagine the fun of sending each of them one by one.', 'truth': 0},
{'content': 'If you are Razr owner...you must have this!', 'truth': 0},
{'content': 'Needless to say I wasted my money.', 'truth': 1},
{'content': 'What a waste of money and time!.', 'truth': 1},
{'content': 'And the sound quality is great.', 'truth': 0},
{'content': 'He was very impressed when going from the original battery to the extended battery.', 'truth': 0},
{'content': 'If the two were seperated by a mere 5+ ft I started to notice excessive static and garbled sound fro...', 'truth': 0},
{'content': 'Very good quality though the design is very odd as the ear "clip" is not very comfortable at all.', 'truth': 0},
{'content': 'Highly recommend for any one who has a blue tooth phone.', 'truth': 0},
{'content': 'I advise EVERYONE DO NOT BE FOOLED!', 'truth': 1},
{'content': 'So Far So Good!. Works great!.', 'truth': 0},
{'content': 'It clicks into place in a way that makes you wonder how long that mechanism would last.', 'truth': 0},
{'content': 'I went on Motorola''s website and followed all directions but could not get it to pair again.', 'truth': 0},
{'content': 'I bought this to use with my Kindle Fire and absolutely loved it!', 'truth': 0},
{'content': 'The commercials are the most misleading.', 'truth': 1},
{'content': 'I have yet to run this new battery below two bars and that''s three days without charging. I bought it for my mother and she had a problem with the battery.', 'truth': 1},
{'content': 'Great Pocket PC / phone combination.', 'truth': 0},
{'content': 'I''ve owned this phone for 7 months now and can say that it''s the best mobile phone I''ve had.', 'truth': 0},
{'content': 'I didn''t think that the instructions provided were helpful to me.', 'truth': 0},
{'content': 'People couldnt hear me talk and I had to pull out the earphone and talk on the phone.', 'truth': 0},
{'content': 'Doesn''t hold charge.', 'truth': 0},
{'content': 'This is a simple little phone to use but the breakage is unacceptible.', 'truth': 1},
{'content': 'This product is ideal for people like me whose ears are very sensitive.', 'truth': 0},
{'content': 'It is unusable in a moving car at freeway speed.', 'truth': 1},
{'content': 'I have two more years left in this contract and I hate this phone.', 'truth': 1}
]
import os
import openai
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
from data import messages
openai.api_key = os.getenv('OPENAI_API_KEY')
def get_completion(prompt, model="gpt-3.5-turbo"):
messages = [{"role": "user", "content": prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0, # this is the degree of randomness of the model's output
)
return response.choices[0].message["content"]
def get_completion_from_messages(messages, model="gpt-3.5-turbo", temperature=0):
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=temperature, # this is the degree of randomness of the model's output
)
return response.choices[0].message["content"]
def predict(prompt):
y_true = []
y_pred = []
responses = []
for message in messages:
# Append the prompt and sentence and send to ChatGPT.
payload = prompt + ' ```' + message['content'] + '```'
response = get_completion(payload)
# Append the truth value and the predicted response.
y_true.append(message['truth'])
y_pred.append(1 if 'yes' in response.lower() else 0)
responses.append(response)
# Initialize the confusion matrix
cm = [[0, 0], [0, 0]]
# Populate the confusion matrix
for true_label, pred_label in zip(y_true, y_pred):
cm[true_label][pred_label] += 1
# Print the confusion matrix
for row in cm:
print(row)
# Calculate precision and recall
precision = cm[1][1] / (cm[1][1] + cm[0][1])
recall = cm[1][1] / (cm[1][1] + cm[1][0])
# Calculate the F-score
f_score = 2 * (precision * recall) / (precision + recall)
# Calculate accuracy
accuracy = (cm[0][0] + cm[1][1]) / sum(map(sum, cm))
return { 'f_score': f_score, 'accuracy': accuracy, 'truth': y_true, 'predicted': y_pred, 'responses': responses }
def summary(result):
for i in range(len(messages)):
if result['predicted'][i] != result['truth'][i]:
print(messages[i]['content'] + ', truth: ' + str(result['truth'][i]) + ', predicted: ' + str(result['predicted'][i]) + ', response: ' + result['responses'][i])
library(ggplot2)
library(RColorBrewer)
library(scales)
data <- c(0.73, 0.87, 0.93, 0.94)
df <- data.frame(x = factor(1:length(data)), y = data)
x <- ggplot(df, aes(x = x, y = y)) +
geom_bar(stat = "identity", aes(fill = x)) +
scale_fill_brewer(palette = "Set1") +
theme_minimal() +
labs(x = "Prompt", y = "Accuracy", title = "Accuracy Scores for Prompt Engineering") +
theme(plot.title = element_text(hjust = 0.5, size = 16)) +
geom_text(aes(label = paste0(round(y * 100), "%")), vjust = 15, fontface = "bold", colour="white", size = 5) +
scale_y_continuous(labels = percent) +
theme(legend.position = "none") +
theme(axis.title.x = element_text(size = 14),
axis.title.y = element_text(size = 14),
axis.title.x.bottom = element_text(margin=margin(t=10)),
axis.title.y.left=element_text(margin=margin(r=10)))
x
result = predict('Determine if the following sentence is angry. Respond only with the word "yes" or "no".')
print(result['f_score'], result['accuracy'])
summary(result)
"""
[34, 0]
[7, 14]
0.8 0.8727272727272727
This movie is terrible., truth: 1, predicted: 0, response: No.
I won't stand for this kind of behavior., truth: 1, predicted: 0, response: No.
Needless to say I wasted my money., truth: 1, predicted: 0, response: No.
The commercials are the most misleading., truth: 1, predicted: 0, response: No.
I have yet to run this new battery below two bars and thats three days without charging. I bought it for my mother and she had a problem with the battery., truth: 1, predicted: 0, response: No.
It is unusable in a moving car at freeway speed., truth: 1, predicted: 0, response: No.
I have two more years left in this contract and I hate this phone., truth: 1, predicted: 0, response: No.
"""
result = predict('Is the sentence about anger? Respond only with the word "yes" or "no".')
print(result['f_score'], result['accuracy'])
summary(result)
"""
[33, 1]
[14, 7]
0.48275862068965514 0.7272727272727273
I hate this food., truth: 1, predicted: 0, response: No.
This movie is terrible., truth: 1, predicted: 0, response: No.
You have no idea how much you've hurt me., truth: 1, predicted: 0, response: No.
I can't even look at you right now., truth: 1, predicted: 0, response: No.
I won't stand for this kind of behavior., truth: 1, predicted: 0, response: No.
I always feel better after going for a run., truth: 0, predicted: 1, response: Yes.
Tied to charger for conversations lasting more than 45 minutes.MAJOR PROBLEMS!!, truth: 1, predicted: 0, response: No.
Needless to say I wasted my money., truth: 1, predicted: 0, response: No.
What a waste of money and time!., truth: 1, predicted: 0, response: No.
I advise EVERYONE DO NOT BE FOOLED!, truth: 1, predicted: 0, response: No.
The commercials are the most misleading., truth: 1, predicted: 0, response: No.
I have yet to run this new battery below two bars and thats three days without charging. I bought it for my mother and she had a problem with the battery., truth: 1, predicted: 0, response: No.
This is a simple little phone to use but the breakage is unacceptible., truth: 1, predicted: 0, response: No.
It is unusable in a moving car at freeway speed., truth: 1, predicted: 0, response: No.
I have two more years left in this contract and I hate this phone., truth: 1, predicted: 0, response: No.
"""
result = predict('Determine if the following sentence enclosed with backticks contains a sentiment of anger. Respond only with the word "yes" or "no".')
print(result['f_score'], result['accuracy'])
summary(result)
"""
[34, 0]
[4, 17]
0.8947368421052632 0.9272727272727272
Needless to say I wasted my money., truth: 1, predicted: 0, response: No.
The commercials are the most misleading., truth: 1, predicted: 0, response: no
I have yet to run this new battery below two bars and thats three days without charging. I bought it for my mother and she had a problem with the battery., truth: 1, predicted: 0, response: No.
It is unusable in a moving car at freeway speed., truth: 1, predicted: 0, response: No.
"""
result = predict('Determine if the following sentence enclosed with backticks contains a sentiment of anger, disgust, or negativity. Respond only with the word "yes" or "no".')
print(result['f_score'], result['accuracy'])
summary(result)
"""
[33, 1]
[3, 18]
0.9 0.9272727272727272
Very good quality though the design is very odd as the ear "clip" is not very comfortable at all., truth: 0, predicted: 1, response: Yes.
The commercials are the most misleading., truth: 1, predicted: 0, response: No.
I have yet to run this new battery below two bars and thats three days without charging. I bought it for my mother and she had a problem with the battery., truth: 1, predicted: 0, response: No.
It is unusable in a moving car at freeway speed., truth: 1, predicted: 0, response: No.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment