How to Use LlamaIndex to Build RAG Apps

If you’ve started exploring Retrieval-Augmented Generation (RAG), you might have noticed that getting a language model to answer questions using your own data is trickier than just calling an API. That’s where LlamaIndex comes in to build RAG apps. It gives you a simple way to connect language models with your data, so you don’t have to build everything yourself.

In this article, I’ll show you how to build a simple RAG app that runs entirely on your own computer using LlamaIndex and Python, with no paid APIs needed. By the end, you’ll see how data ingestion, indexing, retrieval, and response generation all work together in practice.

Why LlamaIndex?

At a high level, a RAG system works in four stages:

  1. Data Ingestion: Load your documents (PDFs, text, etc.).
  2. Indexing: Convert text into embeddings and store them.
  3. Retrieval: Find the most relevant chunks for a query.
  4. Generation: Use an LLM to generate a final answer.

LlamaIndex connects all these parts. It takes care of chunking, embedding, and retrieval, so you can focus on building your app’s main features.

If you want to go beyond basic RAG setups and build real-world AI systems like this, I’ve covered it step-by-step in my book: Hands-On GenAI, LLMs & AI Agents.

LlamaIndex for Building RAG apps: Getting Started

We’ll use only open-source tools:

  1. LlamaIndex
  2. Sentence Transformers (for embeddings)
  3. A local LLM (like Ollama or HuggingFace models)

Start by installing these dependencies:

pip install llama-index
pip install llama-index-llms-ollama
pip install llama-index-embeddings-huggingface
pip install sentence-transformers

Next, install Ollama. Once it’s installed, pull llama 3:

ollama pull llama3

Step 1: Load Your Data

Let’s assume you have a folder called data/ with text files:

from llama_index.core import SimpleDirectoryReader

# Load documents from a directory
documents = SimpleDirectoryReader("data").load_data()

print(f"Loaded {len(documents)} documents")

LlamaIndex reads all the files in your folder and turns them into a standard document format. These documents are then ready for indexing.

Step 2: Set Up the Embedding Model

Next, we’ll use a free HuggingFace embedding model:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Embeddings turn text into vectors, which lets us compare meaning instead of just matching keywords. This is what makes retrieval effective.

Step 3: Set Up the Local LLM

Now we configure a local LLM using Ollama:

from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3")

You’ll use this model later to generate answers using the information you’ve retrieved.

Step 4: Build the Index

Now we combine everything into an index:

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model
)

In this step, the documents are split into chunks, each chunk is turned into an embedding, and then everything is stored in a vector index.

This forms the core of your RAG system.

Step 5: Create a Query Engine

Now we turn the index into something we can query:

query_engine = index.as_query_engine(llm=llm)

With this line, the Retriever finds the most relevant chunks, and the language model generates the final answer.

Step 6: Ask Questions

Now let’s test the system:

response = query_engine.query("What are the candidate's key skills?")
print(response)
The candidate's key skills include:

Programming: Python, SQL
Machine Learning: Scikit-learn, TensorFlow
Data Analysis: Pandas, NumPy
Cloud Platforms: AWS (S3, Lambda)
Tools: Git, Docker

They also demonstrate experience in model deployment, data preprocessing, and building scalable data pipelines.

Behind the scenes:

  1. The query is embedded.
  2. Relevant chunks are retrieved.
  3. The LLM generates an answer using that context.

That’s how a full RAG pipeline works in practice.

Closing Thoughts

Learning to use LlamaIndex for RAG apps isn’t just about memorizing APIs. It’s about understanding how data moves through the system.

Once you see how ingestion, indexing, retrieval, and generation fit together, you’ll be able to swap out parts, scale your system, and solve problems more easily.

I hope you found this article on building RAG apps with LlamaIndex helpful.

For more AI and machine learning tips, follow me on Instagram. My book, Hands-On GenAI, LLMs & AI Agents, can also help you grow your AI career.

Aman Kharwal
Aman Kharwal

AI/ML Engineer | Published Author. My aim is to decode data science for the real world in the most simple words.

Articles: 2098

Leave a Reply

Discover more from AmanXai by Aman Kharwal

Subscribe now to keep reading and get access to the full archive.

Continue reading