Building a Multi-Document RAG System

RAG connects the powerful reasoning of an LLM with the unique information in your own documents. Today, I’ll teach you how to build a Multi-Document RAG System using Python. By the end, you’ll have an app that reads a folder of documents and answers your questions accurately.

Multi-Document RAG System: Getting Started

We are going to build a Multi-Document RAG system from scratch using Python, LangChain, and Ollama. It sounds complex, but I promise you, it’s just a series of logical steps.

We’ll use LangChain for orchestration, Chroma for storage, and Ollama to run the Llama 3 model locally.

First, install these libraries. In your terminal, run:

pip install langchain langchain-community langchain-huggingface langchain-chroma langchain-ollama pypdf

You’ll also need Ollama running locally with the Llama 3 model. After installing Ollama, run ollama pull llama3.

Step 1: Loading the Raw Knowledge

First, gather your source materials. We need to extract text from PDF files. PyPDFLoader is a good choice because it handles the tricky formatting of PDFs well:

import os
from langchain_community.document_loaders import PyPDFLoader

def load_documents(folder_path: str):
    if not os.path.exists(folder_path):
        raise FileNotFoundError(f"Folder '{folder_path}' does not exist")

    documents = []
    for filename in os.listdir(folder_path):
        if filename.endswith(".pdf"):
            file_path = os.path.join(folder_path, filename)
            print(f"📄 Loading: {filename}")
            try:
                loader = PyPDFLoader(file_path)
                documents.extend(loader.load())
            except Exception as e:
                print(f"❌ Error loading {filename}: {e}")
    return documents

Data is rarely perfect. Make sure your loading logic skips non-PDFs and handles errors, so your pipeline keeps running even if one file is bad.

Step 2: Chunking

You can’t give a 100-page document to an LLM all at once because it goes over the memory limit. So, we need to break it into smaller parts.

We use RecursiveCharacterTextSplitter, which is a smart tool. It tries to split text by paragraphs first, then by sentences, so related text stays together:

from langchain_text_splitters import RecursiveCharacterTextSplitter

def split_text(documents):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
    )
    chunks = splitter.split_documents(documents)
    print(f"✂️ Created {len(chunks)} chunks")
    return chunks

Pay attention to chunk_overlap=200. This setting is important because it creates a sliding window, making sure you don’t lose context if a sentence is split between chunks.

Step 3: Embeddings

Computers understand numbers, not words. So, we need to turn our text chunks into lists of numbers, called vectors or embeddings.

This means that if two chunks have similar meanings, like “Dog” and “Puppy,” their numbers will be close to each other:

from langchain_huggingface import HuggingFaceEmbeddings

embedding_function = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

We’ll use all-MiniLM-L6-v2, a lightweight, open-source model that runs quickly on your CPU.

Step 4: The Vector Store

Now we need somewhere to store these numbers for fast searching. A regular SQL database isn’t good for this, so we’ll use a Vector Database. We’ll use Chroma:

from langchain_chroma import Chroma

def create_vector_store(chunks):
    vector_store = Chroma.from_documents(
        documents=chunks,
        embedding=embedding_function,
        persist_directory="./chroma_db",
        collection_name="rag_docs"
    )
    return vector_store

This function saves the database in a folder called ./chroma_db. That way, you don’t have to rebuild the database every time you restart the app; it stays saved.

Step 5: The Brain

This is the most important part. This function links the user, the database, and the LLM:

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


def query_rag_system(query_text, vector_store):
    llm = ChatOllama(model="llama3") # Make sure you have Ollama installed and running!

    retriever = vector_store.as_retriever(search_kwargs={"k": 3})

    prompt = ChatPromptTemplate.from_template(
        """
        You are a helpful assistant.
        Answer ONLY using the context below.
        If the answer is not present, say "I don't know."

        Context:
        {context}

        Question:
        {question}
        """
    )

    chain = (
        {
            "context": retriever | format_docs,
            "question": RunnablePassthrough(),
        }
        | prompt
        | llm
        | StrOutputParser()
    )

    return chain.invoke(query_text)

First, it looks at the user’s question and finds the top 3 most relevant chunks (k=3). Then, it puts those chunks into a strict prompt: “Answer ONLY using the context below.” This helps stop the AI from making things up.

Step 6: Putting It All Together

Finally, the main loop checks if a database already exists. If it doesn’t, it processes the PDFs. Then, it starts a chat loop so you can ask questions:

def main():
    folder_path = "/Users/amankharwal/aiagent/data" # CHANGE THIS to your folder path

    if not os.path.exists("./chroma_db"):
        print("📦 No vector DB found. Creating one...")
        docs = load_documents(folder_path)
        chunks = split_text(docs)
        vector_store = create_vector_store(chunks)
        print("Vector database created")
    else:
        print("📦 Loading existing vector DB...")
        vector_store = Chroma(
            persist_directory="./chroma_db",
            embedding_function=embedding_function,
            collection_name="rag_docs"
        )

    while True:
        query = input("\n❓ Ask a question (or type 'exit'): ")
        if query.lower() == "exit":
            break

        print("🤔 Thinking...")
        answer = query_rag_system(query, vector_store)
        print("\n🧠 Answer:\n", answer)

if __name__ == "__main__":
    main()

Here’s the answer I got for my and my friends’ resumes:

Closing Thoughts

Building systems like this shows me that AI isn’t meant to replace our curiosity; it helps fuel it. When it’s easier to find answers, we can ask better, deeper, and more creative questions.

Don’t be afraid to experiment with this code. Try changing the chunk size, swap llama3 for Mistral, or use a different embedding model. That’s the best way to learn.

If you found this article useful, you can follow me on Instagram for daily AI tips and practical resources. You might also like my latest book, Hands-On GenAI, LLMs & AI Agents. It’s a step-by-step guide to help you get ready for jobs in today’s AI field.

6 Comments

Nitin Dhavare

January 15, 2026 / 12:54 pm Reply

Hi,

I am getting issue while installing langchain_chroma on PyCharm

Details :
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> numpy

Can you please help?. Rest of the packages are installed.
- Aman Kharwal
  
  January 15, 2026 / 3:26 pm Reply
  
  Can you let me know your Python version?
  - Nitin Dhavare
    
    January 19, 2026 / 11:03 am
    
    Initially it was 3.14 then i changed to 3.11 but in vain. I am stuck at this error, Can you please share details?
Nitin Dhavare

January 16, 2026 / 11:42 am Reply

Initially it was 3.14 then i changed to 3.11 but in vain.
- Aman Kharwal
  
  January 19, 2026 / 9:32 pm Reply
  
  3.11 is good, let me see what could be the issue here. I’ll text you soon. Please drop your insta id or linkedin link, I’ll text you.
Nitin

January 19, 2026 / 9:49 pm Reply

https://www.linkedin.com/in/rdnitin

Thanks…

Multi-Document RAG System: Getting Started

Step 1: Loading the Raw Knowledge

Step 2: Chunking

Step 3: Embeddings

Step 4: The Vector Store

Step 5: The Brain

Step 6: Putting It All Together

Closing Thoughts

Aman Kharwal

Recommended For You

Build Your First MCP Server in Python

Production-Grade GenAI Project Ideas

AI/ML Internships for Freshers

Fine-Tuning an Open-Source LLM

6 Comments

Leave a ReplyCancel reply

Discover more from AmanXai by Aman Kharwal