sk-proj-OkunRYA
- Time: 30–40 minutes
- Goal: Take the semantic search you built and use its results as context for an LLM.
You will:
- Re-use your existing
search_booksfunction - Build a prompt that includes retrieved books
- Call an LLM with that prompt
- Compare answers with and without retrieval
All changes happen in the same file you used for the similarity lab.
Your file should already have:
get_embedding(text: str) -> list[float]search_books(query: str, k: int = 5)that returns something like:
[(name, subject, similarity), ...]You should be able to do:
results = search_books("how do I build a website?", k=3)
print(results)and see a few books.
If that works, you’re ready.
You’ll write a helper that:
-
Takes the user’s question
-
Calls
search_books -
Builds a prompt string containing:
- The user’s question
- A short list of the most relevant books
Add this function to your file:
def build_prompt(user_query: str, k: int = 5) -> str:
"""
Use semantic search to find relevant books and build a prompt for the LLM.
"""
results = search_books(user_query, k=k)
lines = []
for name, subject, similarity in results:
lines.append(
f"- {name} (subject: {subject}, similarity: {similarity:.3f})"
)
books_block = "\n".join(lines)
prompt = f"""
You are a helpful assistant recommending technical books.
The user asked this question:
\"\"\"{user_query}\"\"\"
Here are some relevant books from our internal catalogue:
{books_block}
Using ONLY these books as your source of truth:
- Explain which books are most suitable.
- Group them into beginner / intermediate / advanced.
- Justify your choices briefly.
If none of the books seem relevant, say so.
"""
return promptAt the bottom of the file, temporarily add:
if __name__ == "__main__":
q = "What are the best books on artificial intelligence?"
print(build_prompt(q))Run the script. You should see:
- The original question
- A bullet list of books
- Clear instructions for the assistant
Once you’ve checked it, you can comment that block out or keep it for testing.
Now you’ll add a function that sends this prompt to an LLM (e.g. OpenAI-compatible API).
Near the top of your file (or in a config section), add:
import os
import requests
import json
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "YOUR_API_KEY_HERE")
OPENAI_URL = "https://api.openai.com/v1/chat/completions"Set
OPENAI_API_KEYin your environment if possible rather than hard-coding.
Add this function:
def query_llm(prompt: str, temperature: float = 0.4, max_tokens: int = 400) -> str:
"""
Send the prompt to an LLM and return its response text.
"""
headers = {
"Authorization": f"Bearer {OPENAI_API_KEY}",
"Content-Type": "application/json",
}
payload = {
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant for book recommendations."},
{"role": "user", "content": prompt},
],
"temperature": temperature,
"max_tokens": max_tokens,
}
response = requests.post(OPENAI_URL, headers=headers, json=payload)
response.raise_for_status()
data = response.json()
return data["choices"][0]["message"]["content"]Now you’ll create one function that:
- Builds the prompt from the user’s question
- Sends it to the LLM
- Prints the answer
Add:
def answer_with_rag(user_query: str):
prompt = build_prompt(user_query)
answer = query_llm(prompt)
print("\n=== USER QUESTION ===")
print(user_query)
print("\n=== PROMPT SENT TO LLM ===")
print(prompt)
print("\n=== LLM ANSWER (WITH RETRIEVED CONTEXT) ===")
print(answer)At the bottom of the file:
if __name__ == "__main__":
question = "What are the best books to learn artificial intelligence from scratch?"
answer_with_rag(question)Run it.
To see the impact of retrieval, you’ll compare:
- A plain LLM answer (no DB)
- A RAG answer (with
search_books+ retrieved context)
Add:
def answer_without_rag(user_query: str):
payload = f"""
The user asked this question:
\"\"\"{user_query}\"\"\"
Recommend some books. You have NO access to our internal catalogue.
Just answer based on your general knowledge.
"""
answer = query_llm(payload)
print("\n=== LLM ANSWER (NO RETRIEVAL) ===")
print(answer)Update the __main__ block:
if __name__ == "__main__":
question = "What are the best books to learn artificial intelligence from scratch?"
print("\n-----------------------------")
print("WITHOUT retrieval")
print("-----------------------------")
answer_without_rag(question)
print("\n-----------------------------")
print("WITH retrieval (RAG)")
print("-----------------------------")
answer_with_rag(question)Run it again.
For each run, note:
- Does the RAG answer actually mention the books from your DB?
- Does the non-RAG answer hallucinate books that aren’t in your table?
- Which answer would you trust more as “backed by our data”?
Write down one short sentence comparing the two.
Change parameters in query_llm and re-run:
-
Higher temperature:
answer = query_llm(prompt, temperature=0.9)
-
Shorter answer:
answer = query_llm(prompt, max_tokens=150)
-
Ask for JSON output (change system message):
{"role": "system", "content": "You are a helpful assistant. Respond in JSON only."}
For each change, check:
- Did it still use the retrieved books?
- Did it become clearer or more messy?
By the end of this lab, your single RAG file now supports:
get_embedding→ create query embeddingssearch_books→ semantic search over your Postgres databuild_prompt→ inject retrieved results into a well-structured promptquery_llm→ call an LLM APIanswer_with_rag/answer_without_rag→ compare RAG vs non-RAG answers
This is a complete minimal RAG loop and sets you up for:
- Chunking labs (what exactly you embed/store)
- Prompt-injection labs (what happens if retrieved content is malicious?)
- Production concerns (latency, fallbacks, logging, evaluation)