In this tutorial, we'll explore a Python system that orchestrates multiple agents to enable question answering over PDF documents enhanced by live web data retrieval. It's an example of a Retrieval-Augmented Generation (RAG) system with extensible agents and verbose logging to understand the workflow.
The system consists of several collaborating agents, each responsible for a distinct step in the process:
- PDFLoaderAgent: Loads and splits a PDF into token chunks.
- EmbeddingAgent: Converts chunks into embeddings and indexes them with FAISS.
- RetrievalAgent: Searches the indexed chunks for the most relevant ones given a query.
- QAAgent: Uses a chat model to answer questions using retrieved context.
- WebSurfingAgent: Queries an external server (MCP Server) to get live web info if the document context is insufficient.
- RAGOrchestrator: Coordinates the agents, manages ingestion and querying logic.
This modular approach boosts flexibility and debuggability. We also have verbose logging in each agent to trace the execution.
The PDFLoaderAgent loads PDFs with PyPDF2, extracts text, and splits it into overlapping chunks of approximately 500 tokens using OpenAI’s tokenizer.
def load_and_split(self, path: str) -> List[str]:
# Extract text from all pages
full_text = "\n".join(page.extract_text() or "" for page in PdfReader(path).pages)
tokens = ENC.encode(full_text)
# Create overlapping chunks
chunks = []
start = 0
while start < len(tokens):
end = min(start + self.chunk_size, len(tokens))
chunk = ENC.decode(tokens[start:end])
chunks.append(chunk)
start += self.chunk_size - self.chunk_overlap
return chunksThis enables the system to work on manageable text pieces for embedding and retrieval.
The EmbeddingAgent uses the OpenAI embeddings API to get vector representations of chunks, then builds a FAISS index for similarity search.
def embed(self, texts: List[str]) -> List[List[float]]:
response = openai.embeddings.create(model=EMBED_MODEL, input=texts)
return [item.embedding for item in response.data]
def add_to_index(self, texts: List[str]):
embeddings = self.embed(texts)
vecs = np.array(embeddings, dtype="float32")
self.index.add(vecs)Verbose logs indicate when embeddings are created and added to the index.
RetrievalAgent finds the most similar chunks for a new question by embedding the query and performing a FAISS search.
def retrieve(self, query: str, texts: List[str], k: int = 5) -> List[str]:
q_emb = EmbeddingAgent().embed([query])[0]
D, I = self.index.search(np.array([q_emb], dtype="float32"), k)
return [texts[i] for i in I[0] if i < len(texts)]This dynamic retrieval supports up-to-date context per query.
QAAgent sends the question and context as a prompt to the OpenAI chat completion API for the answer.
def answer(self, question: str, context: List[str]) -> str:
context_str = '---\n'.join(context)
prompt = (
f"You are an expert assistant. Use the following context to answer the question.\n\n"
f"Context:\n{context_str}\n\n"
f"Question: {question}\nAnswer:"
)
resp = openai.chat.completions.create(
model=self.model,
messages=[{"role": "system", "content": prompt}],
temperature=0.2,
max_tokens=500,
)
return resp.choices[0].message.content.strip()Verbose printing displays prompt length and answer length for debugging.
If the ingested documents don't provide sufficient answers, the WebSurfingAgent fetches additional info from an MCP Server — a web API endpoint serving live internet data.
def fetch_online_info(self, query: str) -> str:
try:
resp = requests.get(f"{self.base_url}/search", params={"q": query}, timeout=10)
resp.raise_for_status()
return resp.text
except Exception as e:
print(f"[WebSurfingAgent] Failed to fetch data: {e}")
return ""RAGOrchestrator wires all agents together. Its querying logic is:
- Retrieve answer from document chunks.
- Check answer quality heuristically.
- If insufficient and enabled, query
WebSurfingAgent. - Use combined context for a final answer.
def query(self, question: str) -> str:
ctx = self.retriever.retrieve(question, self.text_chunks)
answer = self.qa.answer(question, ctx)
if self.use_web_agent and self.web_agent:
insufficient = (len(answer) < 50) or any(
phrase in answer.lower() for phrase in [
"don't know", "do not know", "no information", "no relevant", "not found"
]
)
if insufficient:
online_data = self.web_agent.fetch_online_info(question)
if online_data:
combined_context = ctx + [online_data]
answer = self.qa.answer(question, combined_context)
return answerVerbose print statements trace each step.
The accompanying streamlit_app.py integrates this system. You upload PDFs, ingest them, and ask questions interactively.
This RAG system demonstrates how to combine document retrieval, vector search, live web data integration, and powerful language models to build intelligent assistants. Each agent performs a crucial role, and orchestration ensures smooth interplay. Verbose logging helps understand the flow and debug.
Feel free to adapt, extend with Langgraph orchestration for more complex workflows, or connect to richer web APIs. This modular architecture is a solid foundation for AI-powered knowledge systems.
If you'd like, I can help you generate the full blog markdown file for you to publish or assist with extending the system further.
Initial version