If you’ve started exploring Retrieval-Augmented Generation (RAG), you might have noticed that getting a language model to answer questions using your own data is trickier than just calling an API. That’s where LlamaIndex comes in to build RAG apps. It gives you a simple way to connect language models with your data, so you don’t have to build everything yourself.
In this article, I’ll show you how to build a simple RAG app that runs entirely on your own computer using LlamaIndex and Python, with no paid APIs needed. By the end, you’ll see how data ingestion, indexing, retrieval, and response generation all work together in practice.
Why LlamaIndex?
At a high level, a RAG system works in four stages:
- Data Ingestion: Load your documents (PDFs, text, etc.).
- Indexing: Convert text into embeddings and store them.
- Retrieval: Find the most relevant chunks for a query.
- Generation: Use an LLM to generate a final answer.
LlamaIndex connects all these parts. It takes care of chunking, embedding, and retrieval, so you can focus on building your app’s main features.
If you want to go beyond basic RAG setups and build real-world AI systems like this, I’ve covered it step-by-step in my book: Hands-On GenAI, LLMs & AI Agents.
LlamaIndex for Building RAG apps: Getting Started
We’ll use only open-source tools:
- LlamaIndex
- Sentence Transformers (for embeddings)
- A local LLM (like Ollama or HuggingFace models)
Start by installing these dependencies:
pip install llama-index
pip install llama-index-llms-ollama
pip install llama-index-embeddings-huggingface
pip install sentence-transformers
Next, install Ollama. Once it’s installed, pull llama 3:
ollama pull llama3
Step 1: Load Your Data
Let’s assume you have a folder called data/ with text files:
from llama_index.core import SimpleDirectoryReader
# Load documents from a directory
documents = SimpleDirectoryReader("data").load_data()
print(f"Loaded {len(documents)} documents")LlamaIndex reads all the files in your folder and turns them into a standard document format. These documents are then ready for indexing.
Step 2: Set Up the Embedding Model
Next, we’ll use a free HuggingFace embedding model:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)Embeddings turn text into vectors, which lets us compare meaning instead of just matching keywords. This is what makes retrieval effective.
Step 3: Set Up the Local LLM
Now we configure a local LLM using Ollama:
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3")You’ll use this model later to generate answers using the information you’ve retrieved.
Step 4: Build the Index
Now we combine everything into an index:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(
documents,
embed_model=embed_model
)In this step, the documents are split into chunks, each chunk is turned into an embedding, and then everything is stored in a vector index.
This forms the core of your RAG system.
Step 5: Create a Query Engine
Now we turn the index into something we can query:
query_engine = index.as_query_engine(llm=llm)With this line, the Retriever finds the most relevant chunks, and the language model generates the final answer.
Step 6: Ask Questions
Now let’s test the system:
response = query_engine.query("What are the candidate's key skills?")
print(response)The candidate's key skills include:
Programming: Python, SQL
Machine Learning: Scikit-learn, TensorFlow
Data Analysis: Pandas, NumPy
Cloud Platforms: AWS (S3, Lambda)
Tools: Git, Docker
They also demonstrate experience in model deployment, data preprocessing, and building scalable data pipelines.
Behind the scenes:
- The query is embedded.
- Relevant chunks are retrieved.
- The LLM generates an answer using that context.
That’s how a full RAG pipeline works in practice.
Closing Thoughts
Learning to use LlamaIndex for RAG apps isn’t just about memorizing APIs. It’s about understanding how data moves through the system.
Once you see how ingestion, indexing, retrieval, and generation fit together, you’ll be able to swap out parts, scale your system, and solve problems more easily.
I hope you found this article on building RAG apps with LlamaIndex helpful.
For more AI and machine learning tips, follow me on Instagram. My book, Hands-On GenAI, LLMs & AI Agents, can also help you grow your AI career.





