Standard RAG uses Vector Search, which works like searching a library by matching keywords or general meaning. It’s good for finding specific facts, but not for making connections. That’s where GraphRAG comes in. Instead of seeing your data as separate documents, GraphRAG views it as a network of connected facts. In this article, I’ll show you how to build a GraphRAG Pipeline for smarter retrieval.
What is GraphRAG?
Picture yourself at a dinner party. Vector RAG is like the guest who has memorized many encyclopedias. If you ask, “Who is Sam Altman?”, they simply recite his biography.
GraphRAG is like the guest who knows all the connections between people. If you ask, “Who is Sam Altman?”, they reply, “He started OpenAI, which got billions from Microsoft. By the way, Microsoft also owns GitHub.”
GraphRAG uses a Knowledge Graph, which is a network of entities (nodes) and relationships (edges). This lets it move from one fact to another and find hidden connections that vector search can’t catch.
Let’s get a better understanding by building a GraphRAG pipeline from the ground up.
Build a GraphRAG Pipeline
We are going to build a pipeline that:
- Reads text.
- Extracts relationships (Subject -> Predicate -> Object).
- Builds a Graph using NetworkX.
- Retrieves context by walking the graph (Multi-hop reasoning).
- Answers a question based on that deep context.
You’ll need a few libraries installed. In your terminal, run:
pip install networkx langchain langchain-ollama
Step 1: Loading the LLM
We need an LLM to do the main work, like reading text and pulling out logic. Here, we’ll use Ollama to run Mistral on your own machine. It’s quick, free, and works well for reasoning tasks.
Before starting, make sure to install Ollama and run this command on your terminal:
ollama pull mistral
Now, let’s load the LLM:
import networkx as nx from langchain_ollama import ChatOllama from langchain_core.prompts import PromptTemplate from langchain_core.output_parsers import JsonOutputParser # 1. Load Local LLM # Temperature=0 is crucial here. We want facts, not creativity. llm = ChatOllama(model="mistral", temperature=0)
For production, you might use GPT-4o or Claude 3.5 Sonnet for better accuracy. But for learning, Mistral is a great choice.
Step 2: Turning Text into Data
This step is the most important. We can’t put raw text straight into a graph; we need triples. A triple is the basic unit of a knowledge graph: (Head) -> [Relation] -> (Tail).
We’ll use a JsonOutputParser to make sure the LLM gives us clean, usable code instead of a conversation:
# 2. Prompt for Extracting Graph Triples
extract_prompt = PromptTemplate(
template="""
You are an expert knowledge graph builder.
Extract entities and relationships from the text.
Return ONLY a JSON list. Each item must contain:
- "head": source entity
- "relation": relationship
- "tail": target entity
Text:
{text}
Output JSON:
""",
input_variables=["text"],
)
extraction_chain = extract_prompt | llm | JsonOutputParser()Step 3: The Data Source
Let’s test our system with a short example about the AI industry. The facts are in separate sentences, and our goal is to connect them:
# 3. Enterprise Knowledge Example
company_text = """
OpenAI was founded by Sam Altman and Elon Musk.
OpenAI developed GPT-4.
GPT-4 powers ChatGPT.
Microsoft partnered with OpenAI.
Microsoft invested 10 billion dollars in OpenAI.
ChatGPT is used by millions of users worldwide.
"""
print("\n Extracting knowledge graph triples...\n")
triples = extraction_chain.invoke({"text": company_text})
print(triples)Extracting knowledge graph triples...
[{'head': 'OpenAI', 'relation': 'founded_by', 'tail': 'Sam Altman'}, {'head': 'OpenAI', 'relation': 'founded_by', 'tail': 'Elon Musk'}, {'head': 'OpenAI', 'relation': 'developed', 'tail': 'GPT-4'}, {'head': 'GPT-4', 'relation': 'powers', 'tail': 'ChatGPT'}, {'head': 'OpenAI', 'relation': 'partnered_with', 'tail': 'Microsoft'}, {'head': 'Microsoft', 'relation': 'invested_in', 'tail': 'OpenAI'}, {'head': 'Microsoft', 'relation': 'invested_amount', 'tail': '10 billion dollars'}, {'head': 'ChatGPT', 'relation': 'used_by', 'tail': 'millions of users worldwide'}]
Step 4: Building the Graph
Now we’ll use NetworkX, a Python library for working with graphs. We’ll take the JSON triples from Step 3 and actually create the connections:
# 4. Build Knowledge Graph
kg = nx.DiGraph() # DiGraph means "Directed Graph" (arrows point one way)
def build_knowledge_graph(triples):
for item in triples:
head = item.get("head")
tail = item.get("tail")
relation = item.get("relation")
if head and tail:
kg.add_node(head)
kg.add_node(tail)
kg.add_edge(head, tail, label=relation)
build_knowledge_graph(triples)
print("\n Nodes in Graph:")
print(list(kg.nodes()))Nodes in Graph:
['OpenAI', 'Sam Altman', 'Elon Musk', 'GPT-4', 'ChatGPT', 'Microsoft', '10 billion dollars', 'millions of users worldwide']
Step 5: Multi-Hop
This is where smart retrieval happens. In standard RAG, searching for “ChatGPT” gives you the sentence “ChatGPT is used by millions.” With GraphRAG, we start at “ChatGPT” and explore its connections:
- Start at ChatGPT.
- Look backward: “Powered by GPT-4”.
- Walk to GPT-4: “Developed by OpenAI”.
- Walk to OpenAI: “Invested in by Microsoft”.
Now we can see that Microsoft is linked to ChatGPT, even though they were never mentioned together in the same sentence in the original text.
Here’s how to implement it:
# 5. MULTI-HOP RETRIEVAL
def retrieve_graph_context(entity, max_depth=2):
context = set()
visited_nodes = set()
def dfs(node, depth):
if depth > max_depth:
return
visited_nodes.add(node)
# 1. Check Outgoing edges (What does this node do?)
for neighbor in kg.successors(node):
relation = kg.get_edge_data(node, neighbor)["label"]
context.add(f"{node} {relation} {neighbor}")
if neighbor not in visited_nodes:
dfs(neighbor, depth + 1)
# 2. Check Incoming edges (Who interacts with this node?)
for predecessor in kg.predecessors(node):
relation = kg.get_edge_data(predecessor, node)["label"]
context.add(f"{predecessor} {relation} {node}")
if predecessor not in visited_nodes:
dfs(predecessor, depth + 1)
if entity in kg.nodes:
dfs(entity, 1) # Start the traversal
return ". ".join(context)Step 6: The Final Answer
Finally, we feed that rich, interconnected context back to the LLM to answer the user’s question:
# 6. Final RAG Prompt
final_prompt = PromptTemplate(
template="""
Answer the question using ONLY the context below.
Context:
{context}
Question:
{question}
Answer:
""",
input_variables=["context", "question"]
)
rag_chain = final_prompt | llm
# 7. Ask a Multi-hop Reasoning Question
entity = "ChatGPT"
# We ask for a depth of 3 to catch distant connections
graph_context = retrieve_graph_context(entity, max_depth=3)
print("\n Retrieved Graph Context:\n")
print(graph_context)
question = "Which company invested in the company that built ChatGPT?"
response = rag_chain.invoke({
"context": graph_context,
"question": question
})
print("\n Final Answer:\n")
print(response.content)Retrieved Graph Context:
GPT-4 powers ChatGPT. OpenAI founded_by Sam Altman. Microsoft invested_amount 10 billion dollars. OpenAI developed GPT-4. Microsoft invested_in OpenAI. ChatGPT used_by millions of users worldwide. OpenAI founded_by Elon Musk. OpenAI partnered_with Microsoft
Final Answer:
Microsoft invested in the company that built ChatGPT.
The model will correctly identify Microsoft. This works because the graph context includes the chain: Microsoft -> invested -> OpenAI -> developed -> GPT-4 -> powers -> ChatGPT.
Closing Thoughts
You might think you could just read the text to find this out. That’s true for a few lines, but imagine having 10,000 PDF reports.
In fraud detection, GraphRAG can connect a suspicious phone number to an address, a previous claim, and a known fraudster. Vector search would miss these links. In medical research, it can connect Drug X to Protein Y to Disease Z across thousands of research papers.
If you found this article useful, you can follow me on Instagram for daily AI tips and practical resources. You might also like my latest book, Hands-On GenAI, LLMs & AI Agents. It’s a step-by-step guide to help you get ready for jobs in today’s AI field.





