Building Your First Local LLM App

Not long ago, using powerful AI meant sending your data to distant server farms. It was like renting a supercomputer; useful, but not personal. Now, with efficient models like Llama 3 and tools like Ollama, things have changed. You can run AI on your own machine. In this article, I’ll walk you through building your first local LLM app, even if you’re new to LLMs.

Build a Local LLM App: Getting Started

We’ll build a working chat-based AI app in Python that runs only on your computer. Your data stays private, there are no API fees, and it’s just you and your code.

To follow along, you need a few dependencies:

Install Python: Ensure you have Python installed (version 3.8 or higher is recommended).
Install Ollama: Go to ollama.com and download the installer for your OS.

Once installed, open your terminal and run:

 ollama pull llama3.2

This command downloads the core model we’ll use.

Next, open your terminal and install the Python libraries we need:

pip install streamlit ollama

Let’s start building your first local LLM app.

Step 1: Imports and Configuration

First, import the necessary libraries and set up the browser tab:

import streamlit as st
import ollama

# Set the page configuration
st.set_page_config(page_title="My Local AI", page_icon="🤖")

st.title("🤖 Local Llama Chatbot")
st.caption("Running locally with Llama 3.2 - No Data Leaves This PC!")

st.set_page_config changes the tab title and icon to make the app look more professional. The caption is important because it highlights the main benefit: privacy.

Step 2: Memory

LLMs do not remember previous messages unless you save the conversation. We use Streamlit’s session_state to keep track of the chat history:

if "messages" not in st.session_state:
    st.session_state["messages"] = [{"role": "assistant", "content": "How can I help you today?"}]

session_state stores information between refreshes. Without it, the app forgets everything each time you interact with it.

Step 3: Displaying History

Each time you use a Streamlit app, the script runs again from start to finish. We need to display the previous conversation so it stays visible:

# Display chat messages from history on app rerun
for msg in st.session_state.messages:
    if msg["role"] == "user":
        st.chat_message(msg["role"]).write(msg["content"])
    else:
        st.chat_message(msg["role"]).write(msg["content"])

Step 4: The Interaction Loop

This part is where we collect the user’s input, show it, and then get a response from Ollama:

# Handle user input
if prompt := st.chat_input("What is on your mind?"):
    # 1. Display user message immediately
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)

    # 2. Placeholder for the AI response
    with st.chat_message("assistant"):
        response_placeholder = st.empty()
        full_response = ""

        # 3. Call the Local Model
        # We use 'stream=True' so the text types out like in ChatGPT
        stream = ollama.chat(
            model='llama3.2',
            messages=[{'role': 'user', 'content': prompt}],
            stream=True,
        )

        # 4. Process the stream
        for chunk in stream:
            if chunk['message']['content']:
                content = chunk['message']['content']
                full_response += content
                response_placeholder.markdown(full_response + "▌")
        
        # Final update to remove the cursor
        response_placeholder.markdown(full_response)
    
    # 5. Save the AI's response to history
    st.session_state.messages.append({"role": "assistant", "content": full_response})

The stream=True parameter is important for a good user experience. Instead of waiting for the whole response, the app shows each word as it is generated. This creates a typing effect and makes the AI feel more responsive.

To run your app, open the terminal in the folder where you saved your code (for example, app.py) and enter:

streamlit run app.py

A browser window will open. Here is an example of what you will see:

Closing Thoughts

You have now built a local LLM app. While this may seem like a small project in the world of data science, it is actually a big step in how we use technology.

Now you can safely paste sensitive documents, personal notes, or private code into this chat window without worrying about it being used to train a large company’s model.

If you found this article useful, you can follow me on Instagram for daily AI tips and practical resources. You might also like my latest book, Hands-On GenAI, LLMs & AI Agents. It’s a step-by-step guide to help you get ready for jobs in today’s AI field.