We’ve all watched Iron Man and quietly envied Tony Stark’s banter with JARVIS. There is something profoundly magical about speaking to a machine and having it understand, think, and reply, not with a pre-programmed script, but with genuine intelligence. For a long time, building something like that required a PhD or massive cloud subscriptions. Today, we can make it in an afternoon with a few lines of Python. In this article, we will build a voice AI assistant that listens to you, thinks using a powerful local AI model (Llama 3), and responds to you.
How a Voice AI Assistant Works in Real-time?
Before we write the code, let’s understand what we are building. An AI voice assistant is essentially a loop of three distinct biological functions replicated by code:
- The Ears (Speech-to-Text): We capture audio vibrations and translate them into text.
- The Brain (LLM Inference): We send that text to a Large Language Model (Ollama/Llama 3) to generate a smart response.
- The Mouth (Text-to-Speech): We convert the AI’s text response back into audio so we can hear it.
Let’s understand it practically by building a real-time voice AI assistant using Python.
Building a Real-Time Voice AI Assistant
To get it running, we are relying on three key libraries. You will need to install them via your terminal:
pip install speechrecognition ollama pyttsx3 pyaudio
You must have the Ollama application installed on your computer and the Llama 3 model pulled (ollama pull llama3) for the brain part of our code to work. Here’s a tutorial if you are a first timer.
Step 1: Importing the Tools
Here we are grabbing our tools. Think of this as laying out your ingredients before cooking. sr is our listener, ollama is our thinker, and pyttsx3 is our speaker:
import speech_recognition as sr import ollama import pyttsx3
Step 2: The Ears
This function is responsible for the physical world interface, the microphone:
def listen():
recognizer = sr.Recognizer()
try:
with sr.Microphone() as source:
print("Listening... (Speak now)")
# Adjust for ambient noise
recognizer.adjust_for_ambient_noise(source, duration=0.5)
# Listen for audio input
audio = recognizer.listen(source, timeout=5, phrase_time_limit=10)
print("Processing...")
# Recognize speech using Google's free API
text = recognizer.recognize_google(audio)
print(f"You said: {text}")
return text
except sr.WaitTimeoutError:
print("No speech detected (timeout).")
return None
except sr.UnknownValueError:
print("Sorry, I didn't catch that.")
return None
except sr.RequestError:
print("Speech recognition service unavailable.")
return None
except Exception as e:
print(f"An error occurred in listen(): {e}")
return NoneHere’s what’s happening:
- adjust_for_ambient_noise: Microphones pick up fan hums and static. This line tells the code to listen to the silence for 0.5 seconds to understand the room’s baseline noise, which makes the actual recognition much more accurate.
- recognize_google: We are using Google’s Web Speech API to convert audio to text. It’s free and generally very accurate, though it does require an internet connection.
Step 3: The Brain
This is the key part of our assistant. We take the raw text and give it intelligence:
def think(text: str):
if not text:
return None
print("Thinking...")
try:
# Ensure you have pulled the model via: ollama pull llama3
response = ollama.chat(
model="llama3",
messages=[
{
"role": "user",
"content": text,
}
],
)
response_text = response["message"]["content"]
print(f"AI: {response_text}")
return response_text
except Exception as e:
print(f"An error occurred in think(): {e}")
return "Sorry, something went wrong while thinking."Here’s what’s happening:
- ollama.chat: This is the interface to your local Llama 3 model. We send a list of messages (in this case, just one from the “user”) and wait for the model to complete the pattern.
- Latency: Since Llama 3 is running locally on your device, this might take a second or two, depending on your GPU/CPU, but it’s completely private. No data is sent to a cloud server for thinking.
Step 4: The Mouth
An assistant isn’t an assistant if you have to read the screen. Here’s how to give it the ability to speak:
def speak(text: str):
if not text:
return
try:
engine = pyttsx3.init()
# Optional: Change voice properties
voices = engine.getProperty("voices")
if voices:
# Try changing index 0 -> 1 for alternative voice
engine.setProperty("voice", voices[0].id)
engine.setProperty("rate", 175) # Speed of speech
engine.say(text)
engine.runAndWait()
except Exception as e:
print(f"An error occurred in speak(): {e}")Here’s what’s happening:
- pyttsx3.init(): This initialises the speech engine driver on your OS (sapi5 on Windows, nsss on Mac, espeak on Linux).
- engine.runAndWait(): This is critical. It blocks the code execution until the speaking is done. Without this, the program might try to listen while it’s still speaking, causing it to hear itself!
Step 5: The Main Function
Finally, we stitch the organs together into a living body:
def main():
print("--- Voice Assistant Started ---")
speak("Hello, I am ready. You can start speaking.")
while True:
# 1. Listen
user_input = listen()
# Skip if nothing heard
if not user_input:
continue
# 2. Check for exit keywords
if user_input.lower().strip() in ["exit", "stop", "quit"]:
speak("Goodbye!")
print("Exiting...")
break
# 3. Think
ai_response = think(user_input)
# 4. Speak
speak(ai_response)
if __name__ == "__main__":
main()Here’s what’s happening:
- The while True Loop: This creates the always-on behaviour. The program enters a cycle of Listen -> Think -> Speak, and then immediately goes back to Listen.
- Exit Strategy: We added a simple check for exit or stop so you can gracefully shut down the assistant without force-quitting the terminal.
Here’s the output with an example:

Closing Thoughts
When you run this script and hear the AI respond to your voice, take a moment to appreciate what just happened. You essentially built a synthetic neocortex (Llama 3) and gave it sensory organs (mic/speakers).
Today it’s just chatting; tomorrow, you could hook the think() function up to your calendar API or email client, turning this from a chatbot into a true proactive agent.
I hope you liked this article on building a voice AI assistant that listens to you, thinks using a powerful local AI model, and speaks back to you. Follow me on Instagram for many more resources.





