AI agents depend on the quality of data and tools they can use. That’s where the Model Context Protocol (MCP) helps. MCP is fast becoming the go-to way to connect AI models with outside tools and data. Still, many developers set up MCP servers that only fetch database rows or read files. In this guide, I’ll show you how to add an LLM to your MCP server.
Why Put an LLM in an MCP Server?
Normally, a client (such as Claude Desktop or a custom LangChain agent) communicates with your MCP server. The client says, “Get me user data,” and your server queries a SQL database, returning raw text.
In practice, sending large amounts of raw data to the main client is not efficient. It can cause context window issues, slow things down, and increase API costs if you use a paid model.
If you add a local LLM to your MCP server, you can make smarter tools. For example, instead of sending back 10,000 lines of logs, your server can use the LLM to find and return just the unusual entries.
Add an LLM to Your MCP Server: Getting Started
To keep things free, private, and practical, we will use:
- Python: The standard language for AI engineering.
- FastMCP: A lightweight Python SDK for building MCP servers quickly.
- Ollama: This tool lets you run LLMs on your own machine. We’ll use the llama3.2 model, which works on most modern laptops and is powerful enough for many tasks.
Let’s get started by building an MCP server that offers a smart text analysis tool.
Step 1: Environment Setup
First, check that Ollama is installed on your computer. Then, open your terminal and download the model we’ll use:
ollama pull llama3.2
Next, set up your Python environment. Make a virtual environment and install the needed libraries:
python -m venv mcp_env
source mcp_env/bin/activate # On Windows, use mcp_env\Scripts\activate
pip install ollama
pip install "mcp[cli]"
Step 2: Writing the Server Code
Create a new file called server.py. We’ll use FastMCP, which handles the complex JSON-RPC communication for you:
from mcp.server.fastmcp import FastMCP
import ollama
# 1. Initialize the FastMCP server
mcp = FastMCP("Local Intelligence Server")
# 2. Define our tool using the decorator
@mcp.tool()
def analyze_data(raw_text: str, instruction: str) -> str:
"""
Analyzes raw text using a local LLM based on specific instructions.
Useful for summarizing logs, extracting entities, or formatting raw data.
"""
# 3. Construct the prompt
prompt = f"Follow this instruction: {instruction}\n\nData to analyze:\n{raw_text}"
try:
# 4. Call the local LLM via Ollama
response = ollama.chat(
model='llama3.2',
messages=[
{'role': 'system', 'content': 'You are a precise data analysis assistant.'},
{'role': 'user', 'content': prompt}
]
)
# 5. Return the model's output
return response['message']['content']
except Exception as e:
return f"Error processing data locally: {str(e)}"
# 6. Run the server
if __name__ == "__main__":
mcp.run()Here’s what’s happening in this setup. This is the pattern you’ll use in real projects:
- Server Initialization: FastMCP(“Local Intelligence Server”) starts the server. Giving it a clear name helps the client know what it’s connecting to.
- The @mcp.tool() Decorator: FastMCP reads the function signature (raw_text, instruction) and the docstring. It turns these into a JSON schema that the client can understand.
- Prompt Construction: We format the inputs clearly so the local model understands the task.
- Local Inference: We send the prompt to ollama.chat. Since this runs on your own machine, there’s no network delay, and your raw_text stays private.
Step 3: Testing the LLM Inside Your MCP Server
Start the MCP server by running:
mcp dev server.py
Your MCP server will start locally and open in your default browser. Be sure to update the settings as shown below:

Next, go to Tools, then click “List Tools”. Select the analyze_data tool and provide sample inputs:
raw_text
2026-02-27 10:42:13 ERROR Database connection timeout
2026-02-27 10:42:18 WARN Retrying connection
2026-02-27 10:43:01 ERROR Failed to fetch user profile
instruction
Analyze these logs and identify the root cause and suggested fixes.
Instead of sending back raw logs, the MCP server will use the local LLM to give you a clear, structured analysis. This shows that:
- The MCP tool is correctly registered
- The LLM is being invoked locally via Ollama
- The server is returning intelligent, processed output
Closing Thoughts
That’s how you can add an LLM to your MCP server.
Learning to use the Model Context Protocol is a valuable skill right now. It connects AI models to real-world systems. Start with this simple setup, try different instructions, and then connect it to your own files or databases.
If you found this article helpful, you can follow me on Instagram for daily AI tips and practical resources. You may also be interested in my latest book, Hands-On GenAI, LLMs & AI Agents, a step-by-step guide to prepare you for careers in today’s AI industry.






The pre-processing pattern is the underrated part of this. Moving the summarization to the server side before it hits the client context window solves a real production problem Have you tested latency tradeoffs when the local LLM step adds processing time on the server side?