Skip to content

Instantly share code, notes, and snippets.

@abvijaykumar
Last active June 10, 2025 03:12
Show Gist options
  • Select an option

  • Save abvijaykumar/1a3cd1d0876fd939935e6254a9c8790d to your computer and use it in GitHub Desktop.

Select an option

Save abvijaykumar/1a3cd1d0876fd939935e6254a9c8790d to your computer and use it in GitHub Desktop.
Vide Coding

Building an Agentic RAG System with Document & Web Retrieval Using Langchain and FAISS

In this tutorial, we'll explore a Python system that orchestrates multiple agents to enable question answering over PDF documents enhanced by live web data retrieval. It's an example of a Retrieval-Augmented Generation (RAG) system with extensible agents and verbose logging to understand the workflow.

Overview of the Architecture

The system consists of several collaborating agents, each responsible for a distinct step in the process:

  • PDFLoaderAgent: Loads and splits a PDF into token chunks.
  • EmbeddingAgent: Converts chunks into embeddings and indexes them with FAISS.
  • RetrievalAgent: Searches the indexed chunks for the most relevant ones given a query.
  • QAAgent: Uses a chat model to answer questions using retrieved context.
  • WebSurfingAgent: Queries an external server (MCP Server) to get live web info if the document context is insufficient.
  • RAGOrchestrator: Coordinates the agents, manages ingestion and querying logic.

This modular approach boosts flexibility and debuggability. We also have verbose logging in each agent to trace the execution.

Loading and Splitting PDFs

The PDFLoaderAgent loads PDFs with PyPDF2, extracts text, and splits it into overlapping chunks of approximately 500 tokens using OpenAI’s tokenizer.

def load_and_split(self, path: str) -> List[str]:
    # Extract text from all pages
    full_text = "\n".join(page.extract_text() or "" for page in PdfReader(path).pages)
    tokens = ENC.encode(full_text)
    # Create overlapping chunks
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + self.chunk_size, len(tokens))
        chunk = ENC.decode(tokens[start:end])
        chunks.append(chunk)
        start += self.chunk_size - self.chunk_overlap
    return chunks

This enables the system to work on manageable text pieces for embedding and retrieval.

Creating Embeddings and Indexing with FAISS

The EmbeddingAgent uses the OpenAI embeddings API to get vector representations of chunks, then builds a FAISS index for similarity search.

def embed(self, texts: List[str]) -> List[List[float]]:
    response = openai.embeddings.create(model=EMBED_MODEL, input=texts)
    return [item.embedding for item in response.data]

def add_to_index(self, texts: List[str]):
    embeddings = self.embed(texts)
    vecs = np.array(embeddings, dtype="float32")
    self.index.add(vecs)

Verbose logs indicate when embeddings are created and added to the index.

Retrieving Relevant Context

RetrievalAgent finds the most similar chunks for a new question by embedding the query and performing a FAISS search.

def retrieve(self, query: str, texts: List[str], k: int = 5) -> List[str]:
    q_emb = EmbeddingAgent().embed([query])[0]
    D, I = self.index.search(np.array([q_emb], dtype="float32"), k)
    return [texts[i] for i in I[0] if i < len(texts)]

This dynamic retrieval supports up-to-date context per query.

Generating Answers with Chat Completion

QAAgent sends the question and context as a prompt to the OpenAI chat completion API for the answer.

def answer(self, question: str, context: List[str]) -> str:
    context_str = '---\n'.join(context)
    prompt = (
        f"You are an expert assistant. Use the following context to answer the question.\n\n"
        f"Context:\n{context_str}\n\n"
        f"Question: {question}\nAnswer:"
    )
    resp = openai.chat.completions.create(
        model=self.model,
        messages=[{"role": "system", "content": prompt}],
        temperature=0.2,
        max_tokens=500,
    )
    return resp.choices[0].message.content.strip()

Verbose printing displays prompt length and answer length for debugging.

Extending with WebSurfing Agent for Live Data

If the ingested documents don't provide sufficient answers, the WebSurfingAgent fetches additional info from an MCP Server — a web API endpoint serving live internet data.

def fetch_online_info(self, query: str) -> str:
    try:
        resp = requests.get(f"{self.base_url}/search", params={"q": query}, timeout=10)
        resp.raise_for_status()
        return resp.text
    except Exception as e:
        print(f"[WebSurfingAgent] Failed to fetch data: {e}")
        return ""

Orchestrating the Workflow

RAGOrchestrator wires all agents together. Its querying logic is:

  1. Retrieve answer from document chunks.
  2. Check answer quality heuristically.
  3. If insufficient and enabled, query WebSurfingAgent.
  4. Use combined context for a final answer.
def query(self, question: str) -> str:
    ctx = self.retriever.retrieve(question, self.text_chunks)
    answer = self.qa.answer(question, ctx)
    if self.use_web_agent and self.web_agent:
        insufficient = (len(answer) < 50) or any(
            phrase in answer.lower() for phrase in [
                "don't know", "do not know", "no information", "no relevant", "not found"
            ]
        )
        if insufficient:
            online_data = self.web_agent.fetch_online_info(question)
            if online_data:
                combined_context = ctx + [online_data]
                answer = self.qa.answer(question, combined_context)
    return answer

Verbose print statements trace each step.

Using the Setup with Streamlit

The accompanying streamlit_app.py integrates this system. You upload PDFs, ingest them, and ask questions interactively.

Conclusion

This RAG system demonstrates how to combine document retrieval, vector search, live web data integration, and powerful language models to build intelligent assistants. Each agent performs a crucial role, and orchestration ensures smooth interplay. Verbose logging helps understand the flow and debug.

Feel free to adapt, extend with Langgraph orchestration for more complex workflows, or connect to richer web APIs. This modular architecture is a solid foundation for AI-powered knowledge systems.


If you'd like, I can help you generate the full blog markdown file for you to publish or assist with extending the system further.

@abvijaykumar
Copy link
Author

Initial version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment