Skip to content

Instantly share code, notes, and snippets.

@astoeckl
Created December 12, 2021 08:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save astoeckl/de83357770f683c261c1f75f6b15da45 to your computer and use it in GitHub Desktop.
Save astoeckl/de83357770f683c261c1f75f6b15da45 to your computer and use it in GitHub Desktop.
import openai
import numpy as np
openai.api_key = "XXX-YOUkey"
from tenacity import retry, wait_random_exponential, stop_after_attempt
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_embedding(text, engine="davinci-similarity"):
# replace newlines, which can negatively affect performance.
text = text.replace("\n", " ")
return openai.Engine(id=engine).embeddings(input = [text])['data'][0]['embedding']
df_news['babbage_similarity'] = df_news.Text.apply(lambda x: get_embedding(x, engine='babbage-similarity'))
df_news.to_csv('output/embedded_newsgroups.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment