Skip to content

Instantly share code, notes, and snippets.

Last active November 20, 2023 13:59
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save csiebler/e287b791333011183792c08bad1dc140 to your computer and use it in GitHub Desktop.
Save csiebler/e287b791333011183792c08bad1dc140 to your computer and use it in GitHub Desktop.
Using LlamaIndex (GPT Index) with Azure OpenAI Service
import os
import openai
from dotenv import load_dotenv
from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader, LLMPredictor, PromptHelper
from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
from llama_index import LangchainEmbedding
# Load env variables (create .env with OPENAI_API_KEY and OPENAI_API_BASE)
# Configure OpenAI API
openai.api_type = "azure"
openai.api_version = "2022-12-01"
openai.api_base = os.getenv('OPENAI_API_BASE')
openai.api_key = os.getenv("OPENAI_API_KEY")
deployment_name = "text-davinci-003"
# Create LLM via Azure OpenAI Service
llm = AzureOpenAI(deployment_name=deployment_name)
llm_predictor = LLMPredictor(llm=llm)
embedding_llm = LangchainEmbedding(OpenAIEmbeddings())
# Define prompt helper
max_input_size = 3000
num_output = 256
chunk_size_limit = 1000 # token window size per document
max_chunk_overlap = 20 # overlap for each token fragment
prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_output, max_chunk_overlap=max_chunk_overlap, chunk_size_limit=chunk_size_limit)
# Read txt files from data directory
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, embed_model=embedding_llm, prompt_helper=prompt_helper)
# Query index with a question
response = index.query("What is azure openai service?")
Copy link

@arindamhazramsft i did as well

from llama_index import ServiceContext service_context = ServiceContext.from_defaults( llm_predictor=llm_predictor, prompt_helper=prompt_helper, embed_model=embedding_llm )

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

Got past that error but I ended up with

openai.error.InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.completion.Completion'>

@carchi8py could you share how you define the service_context and embedding_llm. I'm trying to replicate it but I'm getting the same error you had there about defining the engine or deployment_id

Copy link

@arindamhazramsft i did as well
from llama_index import ServiceContext service_context = ServiceContext.from_defaults( llm_predictor=llm_predictor, prompt_helper=prompt_helper, embed_model=embedding_llm )
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
Got past that error but I ended up with
openai.error.InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.completion.Completion'>

@carchi8py could you share how you define the service_context and embedding_llm. I'm trying to replicate it but I'm getting the same error you had there about defining the engine or deployment_id

# set maximum input size
max_input_size = 4096
# set number of output tokens
num_outputs = 256
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 600
llm = AzureChatOpenAI(deployment_name="gpt-35-turbo", temperature=0,openai_api_version=openai_api_version)
llm_predictor = ChatGPTLLMPredictor(llm=llm, retry_on_throttling=False)
embeddings = LangchainEmbedding(OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1, openai_api_version=openai_api_version))
prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_outputs, max_chunk_overlap=max_chunk_overlap, chunk_size_limit=chunk_size_limit)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embeddings, prompt_helper=prompt_helper)

Copy link

I did the following.

The generate function called a few different llama hub loader I was using and created static files.

` llm = ChatOpenAI(deployment_id="gpt-4-32k-0314")
llm_predictor = LLMPredictor(llm=llm)
embedding_llm = LangchainEmbedding(OpenAIEmbeddings())

options = parse_args()
if options.generate:

#load PDF from Confelunce
documents = SimpleDirectoryReader(DATA_DIR).load_data()

# Define prompt helper
max_input_size = 3000
num_output = 256
chunk_size_limit = 1000
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_output, max_chunk_overlap=max_chunk_overlap, chunk_size_limit=chunk_size_limit)

service_context = ServiceContext.from_defaults(
   llm_predictor=llm_predictor, prompt_helper=prompt_helper, embed_model=embedding_llm
# Create index
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

Copy link

dell1503 commented May 29, 2023

Did not work for me.
With the following Packages:


I still got the API Error:
openai.error.InvalidRequestError: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.

I have in Azure a Deplyoment called: gpt-35-turbo
Bildschirmfoto 2023-05-29 um 21 18 53

I run this part of Code, was working:

Any Ideas?

Copy link

@csiebler Did not work for me. With the following Packages:


I still got the API Error: openai.error.InvalidRequestError: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.

I have in Azure a Deplyoment called: gpt-35-turbo Bildschirmfoto 2023-05-29 um 21 18 53

I run this part of Code, was working: image

Any Ideas?

I had deploy as well the training Model....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment