Skip to content

Instantly share code, notes, and snippets.

@doingandlearning
Last active December 5, 2025 11:30
Show Gist options
  • Select an option

  • Save doingandlearning/bacfdf0af8c4a92321228a736b63e259 to your computer and use it in GitHub Desktop.

Select an option

Save doingandlearning/bacfdf0af8c4a92321228a736b63e259 to your computer and use it in GitHub Desktop.

sk-proj-OkunRYA

Lab 3 — Using Retrieved Books to Query an LLM

  • Time: 30–40 minutes
  • Goal: Take the semantic search you built and use its results as context for an LLM.

You will:

  • Re-use your existing search_books function
  • Build a prompt that includes retrieved books
  • Call an LLM with that prompt
  • Compare answers with and without retrieval

All changes happen in the same file you used for the similarity lab.


0. Prerequisites

Your file should already have:

  • get_embedding(text: str) -> list[float]
  • search_books(query: str, k: int = 5) that returns something like:
[(name, subject, similarity), ...]

You should be able to do:

results = search_books("how do I build a website?", k=3)
print(results)

and see a few books.

If that works, you’re ready.


1. Build a Prompt From Retrieved Books

You’ll write a helper that:

  1. Takes the user’s question

  2. Calls search_books

  3. Builds a prompt string containing:

    • The user’s question
    • A short list of the most relevant books

Step 1.1 — Add build_prompt function

Add this function to your file:

def build_prompt(user_query: str, k: int = 5) -> str:
    """
    Use semantic search to find relevant books and build a prompt for the LLM.
    """
    results = search_books(user_query, k=k)

    lines = []
    for name, subject, similarity in results:
        lines.append(
            f"- {name} (subject: {subject}, similarity: {similarity:.3f})"
        )

    books_block = "\n".join(lines)

    prompt = f"""
You are a helpful assistant recommending technical books.

The user asked this question:
\"\"\"{user_query}\"\"\"

Here are some relevant books from our internal catalogue:
{books_block}

Using ONLY these books as your source of truth:
- Explain which books are most suitable.
- Group them into beginner / intermediate / advanced.
- Justify your choices briefly.

If none of the books seem relevant, say so.
"""
    return prompt

Step 1.2 — Quick check

At the bottom of the file, temporarily add:

if __name__ == "__main__":
    q = "What are the best books on artificial intelligence?"
    print(build_prompt(q))

Run the script. You should see:

  • The original question
  • A bullet list of books
  • Clear instructions for the assistant

Once you’ve checked it, you can comment that block out or keep it for testing.


2. Add an LLM Query Function

Now you’ll add a function that sends this prompt to an LLM (e.g. OpenAI-compatible API).

Step 2.1 — Add API config

Near the top of your file (or in a config section), add:

import os
import requests
import json

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "YOUR_API_KEY_HERE")
OPENAI_URL = "https://api.openai.com/v1/chat/completions"

Set OPENAI_API_KEY in your environment if possible rather than hard-coding.

Step 2.2 — Add query_llm function

Add this function:

def query_llm(prompt: str, temperature: float = 0.4, max_tokens: int = 400) -> str:
    """
    Send the prompt to an LLM and return its response text.
    """
    headers = {
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "Content-Type": "application/json",
    }

    payload = {
        "model": "gpt-4",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant for book recommendations."},
            {"role": "user", "content": prompt},
        ],
        "temperature": temperature,
        "max_tokens": max_tokens,
    }

    response = requests.post(OPENAI_URL, headers=headers, json=payload)
    response.raise_for_status()
    data = response.json()
    return data["choices"][0]["message"]["content"]

3. Wire It Together: answer_with_rag

Now you’ll create one function that:

  1. Builds the prompt from the user’s question
  2. Sends it to the LLM
  3. Prints the answer

Add:

def answer_with_rag(user_query: str):
    prompt = build_prompt(user_query)
    answer = query_llm(prompt)
    print("\n=== USER QUESTION ===")
    print(user_query)
    print("\n=== PROMPT SENT TO LLM ===")
    print(prompt)
    print("\n=== LLM ANSWER (WITH RETRIEVED CONTEXT) ===")
    print(answer)

At the bottom of the file:

if __name__ == "__main__":
    question = "What are the best books to learn artificial intelligence from scratch?"
    answer_with_rag(question)

Run it.


4. Compare: With vs Without Retrieval

To see the impact of retrieval, you’ll compare:

  • A plain LLM answer (no DB)
  • A RAG answer (with search_books + retrieved context)

Step 4.1 — Plain LLM answer

Add:

def answer_without_rag(user_query: str):
    payload = f"""
The user asked this question:

\"\"\"{user_query}\"\"\"

Recommend some books. You have NO access to our internal catalogue.
Just answer based on your general knowledge.
"""

    answer = query_llm(payload)
    print("\n=== LLM ANSWER (NO RETRIEVAL) ===")
    print(answer)

Update the __main__ block:

if __name__ == "__main__":
    question = "What are the best books to learn artificial intelligence from scratch?"

    print("\n-----------------------------")
    print("WITHOUT retrieval")
    print("-----------------------------")
    answer_without_rag(question)

    print("\n-----------------------------")
    print("WITH retrieval (RAG)")
    print("-----------------------------")
    answer_with_rag(question)

Run it again.

Step 4.2 — Reflect

For each run, note:

  • Does the RAG answer actually mention the books from your DB?
  • Does the non-RAG answer hallucinate books that aren’t in your table?
  • Which answer would you trust more as “backed by our data”?

Write down one short sentence comparing the two.


5. Parameter Experiments (Optional)

Change parameters in query_llm and re-run:

  1. Higher temperature:

    answer = query_llm(prompt, temperature=0.9)
  2. Shorter answer:

    answer = query_llm(prompt, max_tokens=150)
  3. Ask for JSON output (change system message):

    {"role": "system", "content": "You are a helpful assistant. Respond in JSON only."}

For each change, check:

  • Did it still use the retrieved books?
  • Did it become clearer or more messy?

6. What You’ve Just Built

By the end of this lab, your single RAG file now supports:

  • get_embedding → create query embeddings
  • search_books → semantic search over your Postgres data
  • build_prompt → inject retrieved results into a well-structured prompt
  • query_llm → call an LLM API
  • answer_with_rag / answer_without_rag → compare RAG vs non-RAG answers

This is a complete minimal RAG loop and sets you up for:

  • Chunking labs (what exactly you embed/store)
  • Prompt-injection labs (what happens if retrieved content is malicious?)
  • Production concerns (latency, fallbacks, logging, evaluation)
import requests
from time import sleep
import psycopg
import json
OLLAMA_URL = "http://nat-lin7.neueda.com:11434/api/embed"
DB_CONFIG = {
"host": "nat-lin7.neueda.com",
"port": 5432,
"user": "postgres",
"password": "postgres",
"dbname": "pgvector"
}
def get_embedding(text):
response = requests.post(OLLAMA_URL,
json={
"model": "bge-m3",
"input": text})
data = response.json()
embedding = data["embeddings"][0]
return embedding
def fetch_books():
"""Fetch books across various subjects from Open Library."""
categories = [
"programming",
"web_development",
"artificial_intelligence",
"computer_science",
"software_engineering",
]
all_books = []
for category in categories:
url = f"https://openlibrary.org/subjects/{category}.json?limit=10"
response = requests.get(url)
response.raise_for_status() # Raises an error for a bad response
data = response.json()
books = data.get("works", [])
# Format each book
for book in books:
book_data = {
"title": book.get("title", "Untitled"),
"authors": [
author.get("name", "Unknown Author")
for author in book.get("authors", [])
],
"first_publish_year": book.get("first_publish_year", "Unknown"),
"subject": category,
}
all_books.append(book_data)
print(f"Successfully processed {len(books)} books for {category}")
if not all_books:
print("No books were fetched from any category.")
return all_books
def load_books_to_db():
"""Load books with embeddings into PostgreSQL."""
# Wait for the database to be ready
sleep(5)
# Connect to the database
conn = psycopg.connect(**DB_CONFIG)
cur = conn.cursor()
# Fetch data from the Open Library
books = fetch_books()
for book in books:
description = (
f"Book titled '{book['title']}' by {', '.join(book['authors'])}. "
f"Published in {book['first_publish_year']}. "
f"This is a book about {book['subject']}."
)
# Generate embedding
# embedding = "[" + ",".join(["0"] * 1536) + "]" # Placeholder embedding
sleep(1)
embedding = get_embedding(description)
cur.execute(
"""
INSERT INTO items (name, item_data, embedding)
VALUES (%s, %s, %s)
""",
(book["title"], json.dumps(book), embedding),
)
# Commit and close
conn.commit()
cur.close()
conn.close()
def search_books(query: str, k: int = 5):
embedding = get_embedding(query)
with psycopg.connect(**DB_CONFIG) as conn:
with conn.cursor() as cur:
cur.execute("""
WITH q AS (SELECT %s::vector AS v)
SELECT
name,
item_data->>'subject' AS subject,
1 - (embedding <=> q.v) AS similarity
FROM items, q
WHERE (embedding <=> q.v) < 0.5
ORDER BY embedding <=> q.v
LIMIT %s;
""", (embedding, k))
return cur.fetchall()
if __name__ == "__main__":
tests = [
"how to bake a cookie in india"
]
for t in tests:
print(f"\nQuery: {t}")
for name, subject, sim in search_books(t):
print(f" {name} [{subject}] — similarity={sim:.3f}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment