A typical RAG system quickly searches your uploaded tax documents or technical manuals to find a relevant snippet, even for something as simple as a casual greeting. Agentic RAG works differently. Instead of always looking up information, the agent stops to consider whether it really needs to search or if it can answer on its own. In this article, I’ll show you how to build an Agentic RAG pipeline using Python and LangChain.
Agentic RAG Pipeline: Getting Started
In this guide, we’ll build a local, privacy-friendly Agentic RAG pipeline using Python, LangChain, and a lightweight Google model. We’ll go beyond just writing code to create a system that acts a bit more like a person.
We’ll use LangChain to manage the process, ChromaDB for storing vectors, and Google’s Flan-T5 as our local language model. Everything runs on your own computer, so you don’t need any API keys.
You will need a few libraries installed. In your terminal:
pip install langchain langchain-community langchain-chroma transformers sentence-transformers pypdf
Step 1: Loading the Knowledge
First, we need to give our AI something to read. We use a function to scan a folder for PDFs:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from transformers import pipeline
# Load PDFs from a folder
def load_docs(folder_path):
docs = []
for file in os.listdir(folder_path):
if file.endswith(".pdf"):
loader = PyPDFLoader(os.path.join(folder_path, file))
docs.extend(loader.load())
return docs
# Update this path to where your PDFs are stored
docs = load_docs("/Users/amankharwal/aiagent/data")
print("PDF Pages Loaded:", len(docs))In this step, we go through a folder, find PDF files, and load them one page at a time. It’s like stacking books on your desk before you start studying.
Step 2: Chunking
LLMs can only read a certain amount of text at once, called the context window. Even if they could handle more, giving them a whole 500-page book to answer one question isn’t efficient:
# Split PDFs into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=80
)
chunks = text_splitter.split_documents(docs)
print("Chunks Created:", len(chunks))Notice chunk_overlap=80. Instead of just cutting the text, we let the chunks overlap a bit. This way, sentences aren’t split in half at the edge of a chunk, so the meaning (or semantic context) is preserved across breaks.
Step 3: Embeddings & Vector Store
Now we convert text into numbers, called vectors, that the computer can understand. We store these in Chroma, which is a vector database:
# Embeddings
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# Save texts into Chroma vector DB
texts = [c.page_content for c in chunks]
db = Chroma(
collection_name="rag_store",
embedding_function=embedding_model
)
db.add_texts(texts)
# Retriever
retriever = db.as_retriever(search_kwargs={"k": 3})We’re using all-MiniLM-L6-v2. It’s a small, fast model that’s great for local development. It puts similar concepts close together in space. When we search later, we won’t be matching keywords; we’ll be matching meanings.
Step 4: The Brain
We need a model to generate the actual answers. We are using google/flan-t5-base:
# Local LLM
llm = pipeline(
"text2text-generation",
model="google/flan-t5-base",
max_new_tokens=150
)Flan-T5 is a “seq2seq” model. It’s great at following instructions like “Summarize this” or “Answer this,” which makes it ideal for RAG tasks even though it’s smaller.
Step 5: The Agent
This is the key part. This simple function is what makes the pipeline Agentic:
# Agent brain
def agent_controller(query):
q = query.lower()
if any(word in q for word in ["pdf", "document", "data", "summarize", "information", "find"]):
return "search"
return "direct"Instead of sending everything to the database, this controller analyzes the user’s intent:
- Does the user want data from the file? Action: Search
- Is the user just chatting or asking for general knowledge? Action: Direct
In a production system, you might use a powerful LLM to make this decision. But for learning, this keyword-based method is a great way to show the Routing Pattern in agentic AI.
Step 6: The Execution Loop
Finally, we tie it all together:
# RAG
def rag_answer(query):
action = agent_controller(query)
if action == "search":
print(f"🕵️ Agent decided to SEARCH document for: '{query}'")
results = retriever.invoke(query)
context = "\n".join([r.page_content for r in results])
final_prompt = f"Use this context:\n{context}\n\nAnswer:\n{query}"
else:
print(f"🤖 Agent decided to answer DIRECTLY: '{query}'")
final_prompt = query
response = llm(final_prompt)[0]["generated_text"]
return response
# Test 1: A document-specific question
query = "Give me a 5-point summary from the PDF"
print(rag_answer(query))
print("-" * 20)
# Test 2: A general knowledge question
print(rag_answer("What is an Ideal Resume Format? Explain in 50 words.")) In the first case, the agent sees the word “PDF” or “summary” and uses the retriever. In the second case, it knows it doesn’t need your documents to explain a resume format, so it answers using its own pre-trained knowledge.
Closing Thoughts
When I first started working with AI, I thought bigger was better. I wanted the largest model and the biggest database. But over time, I learned that intelligence is really about efficiency, not just raw power.
By building this Agentic router, you save computing resources and reduce latency. Most importantly, you create a system that respects the user’s context.
If you found this article useful, you can follow me on Instagram for daily AI tips and practical resources. You might also like my latest book, Hands-On GenAI, LLMs & AI Agents. It’s a step-by-step guide to help you get ready for jobs in today’s AI field.






I love this because it’s simple and precise.
Thanks, keep visiting 😇
I am working on rag since 2022 , but the way you describe the concept is very Amazing and it’s help the readers too much to understand easily & and I am totally agreed that intelligence is really about efficiency, not just raw power.
Best wishes ahead Man .
Thanks, keep visiting 😇
Hello Aman, I’m new to this concept and I would like to know more about it, how do I go about it or do you have a site that will explain the concept to me?
Start with RAGs first. Here’s a complete guide: https://amanxai.com/2025/10/21/build-your-first-rag-system-from-scratch/
You can also follow my book for detailed understanding: https://play.google.com/store/books/details?id=542aEQAAQBAJ