Have you ever built a smart chatbot, only to find it forgets your conversation after just a couple of messages? You might ask it to analyze some data, then say, “Can you adjust that last calculation?”, but it gets confused because it lost the context. From what I’ve seen, integrating memory to AI agents is the biggest step you can take to move from simple, stateless scripts to building reliable, independent assistants.
When you start building AI agents for real-world use, you quickly realize that intelligence without context doesn’t get you very far. Today, I’ll show you how to build an agent that can remember.
Unpacking Agent Memory
First, it’s important to know that an AI model doesn’t really “remember” anything by itself. Each time you send a prompt, it starts from scratch. To give an agent memory, we need to bring past information into the current conversation.
In the AI ecosystem, we generally divide memory into three distinct categories:
- Short-Term Memory: This is your agent’s immediate context, like a chat history. It helps the agent keep track of a single conversation. But since Language Models (LLMs) have token limits, you can’t put endless chat history into short-term memory without slowing things down or causing errors.
- Long-Term Memory: This lets an agent keep information over many days, sessions, or projects. We do this by turning text into numbers, called embeddings, and saving them in a Vector Database.
- Retrieval-Based Memory: This is how the agent finds information. When you ask a question, the agent looks through its long-term Vector Database for useful past details, pulls them out, and adds them to the current conversation before replying.
Integrating Memory into AI Agents: The Practical Implementation
Let’s dive in. We’ll build an agent that can take in facts, save them in long-term memory, and use short-term memory to keep the conversation flowing naturally.
We’ll use Ollama to run a local, open-source model like Llama 3 or Mistral for free. For text embeddings, we’ll use HuggingFace, and for our local vector store, we’ll use ChromaDB.
Prerequisites
Before starting, make sure you have Ollama installed on your computer and run ollama pull llama3. Next, install these Python libraries:
pip install -U langchain langchain-community langchain-core langchain-ollama
pip install -U langchain-huggingface chromadb sentence-transformers
Step 1: Setting up the LLM and Embeddings
First, we need to set up our local LLM and the embedding model that will turn our text into searchable vectors:
from langchain_ollama import OllamaLLM
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
# 1. Initialize LLM
llm = OllamaLLM(model="llama3")
# 2. Initialize Embedding Model
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)Here’s a breakdown of what we did above:
- We imported the Ollama wrapper from LangChain so we can talk to our local Llama 3 model.
- We also imported HuggingFaceEmbeddings. Instead of paying for embeddings, we’re using all-MiniLM-L6-v2, which is a fast and efficient open-source model. It turns sentences into numbers so our database can understand what they mean.
Step 2: Building Long-Term Memory (The Vector Store)
Now, let’s make a database that will store facts for the long term:
# 3. Create Long-Term Memory
memories = [
Document(page_content="The user's name is Aman."),
Document(page_content="Aman's primary programming language is Python."),
Document(page_content="Aman is currently building an AI agent tutorial."),
]
# 4. Store Memories in ChromaDB
vector_store = Chroma.from_documents(
documents=memories,
embedding=embeddings,
persist_directory="./agent_memory_db"
)
retriever = vector_store.as_retriever(search_kwargs={"k": 3})Here’s a breakdown of what we did above:
- We made a list of Document objects. In a real project, this data might come from uploaded PDFs, past user chats, or database records.
- We use Chroma.from_documents to turn these texts into embeddings and save them in a local folder (./agent_memory_db). Since we set persist_directory, this long-term memory will still be there even if we restart our Python script.
- We set up a retriever and set k=3, so the agent will only pull the top 3 most relevant memories. This helps save token space.
Step 3: Combining Short-Term Memory and Retrieval
Finally, we connect everything. Our agent will remember the current chat (Short-Term) and pull facts from ChromaDB (Long-Term):
# 5. Short-Term Memory
chat_history = []
MAX_HISTORY = 5
# 6. Chat Loop
print("Agent: Hello! Ask me anything.")
print("Type 'exit' to quit.\n")
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
print("Agent: Goodbye!")
break
# Retrieve relevant long-term memories
retrieved_docs = retriever.invoke(user_input)
retrieved_context = "\n".join(
[doc.page_content for doc in retrieved_docs]
)
# Build short-term conversation context
history_text = ""
for human, assistant in chat_history[-MAX_HISTORY:]:
history_text += f"Human: {human}\n"
history_text += f"Assistant: {assistant}\n"
# Final Prompt
prompt = f"""
You are a helpful AI assistant.
Relevant Long-Term Memory:
{retrieved_context}
Recent Conversation:
{history_text}
Current User Question:
{user_input}
Answer naturally and use the memories when relevant.
"""
# Generate Response
answer = llm.invoke(prompt)
print(f"\nAgent: {answer}\n")
# Save Short-Term Memory
chat_history.append(
(user_input, answer)
)Here’s a breakdown of what we did above:
- We made an empty chat_history list to keep track of recent conversations as the agent’s short-term memory.
- We set MAX_HISTORY = 5, so the agent only remembers the last five interactions. This keeps prompts from getting too big.
- For each user question, we use the retriever to get the most relevant long-term memories from ChromaDB.
- We put these retrieved memories together into one context block and add the recent chat history too.
- We create a prompt that includes the long-term memory, short-term memory, and the user’s current question.
- We send this prompt to the Llama 3 model so it can answer using both types of memory.
- 1. Finally, we save the latest user question and agent response to chat_history. This helps the agent keep track of the conversation.
Here’s the final output:
Agent: Hello! Ask me anything.
Type 'exit' to quit.
You: What is my name?
Agent: Nice to chat with you again, Aman! You know, I've been keeping track of our conversations, and I recall that your name is... (drumroll please)... Aman! Yes, I have it recorded multiple times in my memory bank. So, to answer your question: your name is indeed Aman!
You: Which coding language do I use the most?
Agent: Nice follow-up question, Aman!
As we've discussed before, I remember that your primary programming language is Python. In fact, I've recorded it multiple times in my memory bank! So, to answer your question: you use Python as your go-to coding language.
You: My favorite AI framework is LangGraph.
Agent: Nice to chat with you again, Aman!
I see that you're working on an AI agent tutorial - that's a fascinating project! I'm happy to help you out.
Regarding your favorite AI framework being LangGraph, that's great! I remember our previous conversations about your interest in building AI agents. You had mentioned earlier that you were exploring different frameworks for your tutorial. LangGraph seems like a great choice!
As an AI assistant, I've been keeping track of our conversations, and I recall that you're quite familiar with Python. Since LangGraph is likely built upon or compatible with Python, do you have any specific questions about implementing it in your tutorial? I'm here to help!
You: Can you remind me what I said earlier?
Agent: Nice question, Aman! I'd be happy to help you recall what we discussed earlier.
Let me check my memory bank... Ah yes! I remember that you asked "What is my name?" And I replied, "Nice to chat with you again, Aman! You know, I've been keeping track of our conversations, and I recall that your name is... (drumroll please)... Aman! Yes, I have it recorded multiple times in my memory bank. So, to answer your question: your name is indeed Aman!"
And if I'm not mistaken, you also asked me "Which coding language do I use the most?" And I replied that you primarily use Python.
As for your favorite AI framework being LangGraph, we had a great conversation about it earlier too! You mentioned that you're working on an AI agent tutorial and exploring different frameworks. I remember offering to help you with implementing LangGraph in your project.
Is there anything specific you'd like me to recall or help you with regarding these conversations?
You: exit
Agent: Goodbye!
Real-World Applications and Common Pitfalls
In business environments, I use this setup a lot for customer support bots, personalized tutoring systems, and tools that search internal documents. For example, an HR bot uses long-term memory to look up company policies and short-term memory to handle follow-up questions smoothly.
But watch out for “memory corruption.” If you save everything a user says into long-term memory without checking, your vector database will fill up with unhelpful small talk like “Hello” or “Thanks.” Always add a step in between, usually with a smaller, faster LLM, to summarize and decide if something is worth saving for good.
Closing Thoughts
Building AI isn’t just about memorizing framework syntax. It’s really about understanding how data moves through your system.
Don’t let all the AI hype distract you. Focus on the basics. Try building this free, local setup on your own computer, play around with window sizes, and see how the agent’s behavior changes. The best way to learn is by being consistent and experimenting for yourself.
I hope you enjoyed this article about integrating memory into AI agents.
For more AI and machine learning tips, follow me on Instagram. My book, Hands-On GenAI, LLMs & AI Agents, can also help you grow your AI career.





