A big challenge I often see when teams start using Generative AI is not the model itself, but connecting the model to the data they use daily. Many people want to build smart agents, but most business data is still in spreadsheets. To create useful AI tools, you need to know how to connect your LLM to Google Sheets.
In this tutorial, you’ll learn how to build a pipeline that takes data from a Google Sheet, processes it with Python on your computer, and sends it to a local LLM for automated analysis.
Prerequisites: Setting Up the Environment
Before we start coding, let’s set up two things first:
- our Google Cloud credentials
- and our local LLM
Many junior engineers get stuck at this step, so let’s go through it together, step by step.
The Google Cloud Setup
To read a Google Sheet with your script, you need to give it an identity. You can do this by creating a Service Account:
- Go to the Google Cloud Console.
- Create a new project and enable the Google Sheets API and Google Drive API.
- Navigate to Credentials -> Create Credentials -> Service Account.
- Once created, go to the Service Account’s “Keys” tab and click Add Key -> Create New Key (JSON). Download this file to your machine.
- This next step is crucial. Open your JSON file, copy the client_email address, and share your Google Sheet with that email, just as you would with a coworker. If you miss this, you’ll get a 403 Permission Denied error.
The Local LLM Setup
We’ll use a local LLM so everything stays free and private:
- Download and install Ollama.
- Open your terminal and run: ollama pull llama3.
This command downloads the model and starts a local server on port 11434.
Next, install the required libraries in your environment:
pip install gspread pandas requests google-auth
Connect Your LLM to Google Sheets: A Practical Pipeline
Now let’s start coding. We’ll build this the same way you would set up a real-world data pipeline.
Step 1: Authentication
import gspread
from google.oauth2.service_account import Credentials
import pandas as pd
import requests
SCOPES = [
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive"
]
creds = Credentials.from_service_account_file(
"/Users/amankharwal/gsheet/gen-lang-client-0785384276-8bee5091be3f.json",
scopes=SCOPES
)
client = gspread.authorize(creds)Here, you set the API scopes, which control what your script can do, and link the Credentials object to the JSON key you downloaded.
Make sure the file path matches where you saved your JSON file. Once you’re authorized, gspread manages the session for you.
If you want to learn how to build more real-world AI pipelines like this, I’ve covered it step-by-step in my book Hands-On GenAI, LLMs & AI Agents.
Step 2: Load the Google Sheet
SHEET_ID = "1SM-JTztj_RVCM1mLaDdQ9SRp5efFzTV9Jp3G6Q0XdEQ"
sheet = client.open_by_key(SHEET_ID).sheet1
data = sheet.get_all_records()
df = pd.DataFrame(data)
print("Raw Data:")
print(df.head())Raw Data:
Title ... Hours Viewed
0 The Night Agent: Season 1 ... 812100000
1 Ginny & Georgia: Season 2 ... 665100000
2 The Glory: Season 1 // 더 글로리: 시즌 1 ... 622800000
3 Wednesday: Season 1 ... 507700000
4 Queen Charlotte: A Bridgerton Story ... 503000000
[5 rows x 4 columns]
We use open_by_key() instead of open(“Sheet Name”).

/d/ and /edit.This is a good practice because if someone renames the spreadsheet, your pipeline will still work.
After that, we pull all the records and put them into a Pandas DataFrame.
Step 3: Clean the Data
def clean_hours(x):
try:
return int(str(x).replace(",", ""))
except:
return 0
df["Hours Viewed"] = df["Hours Viewed"].apply(clean_hours)
# Optional: convert dates
df["Release Date"] = pd.to_datetime(df["Release Date"], errors="coerce")Real-world data is often messy. In spreadsheets, numbers might be stored as strings with commas, like “1,200,000”. If you send that straight to an LLM or try to sort it in Python, it might fail or sort incorrectly.
To fix this, we always use a simple apply function to turn these into integers.
Step 4: Local LLM Connection
def query_llm(prompt):
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "llama3",
"prompt": prompt,
"stream": False
}
)
# Basic error handling (important in real workflows)
if response.status_code != 200:
raise Exception(f"LLM Error: {response.text}")
return response.json().get("response", "")You don’t need big frameworks like LangChain to connect to an LLM. Ollama gives you a simple REST API on your computer, so a regular requests.post call is enough.
This keeps your dependencies minimal and your code easy to read.
Step 5: The Hybrid Analysis Approach
# Always use Python for heavy lifting
top_shows = df.sort_values(by="Hours Viewed", ascending=False).head(10)
text_data = top_shows.to_string(index=False)
prompt = f"""
You are analyzing Netflix viewership data.
Here are the top 10 shows based on hours viewed:
{text_data}
Tasks:
1. Identify patterns in top-performing content
2. Highlight any trends (release timing, global availability, etc.)
3. Keep output concise (5 bullet points max)
"""
result = query_llm(prompt)LLM Insights:
Based on the Netflix viewership data, here are my findings:
**Patterns and Trends:**
• **New releases perform well**: Shows released in 2023 (The Night Agent, King the Land, Queen Charlotte) have strong viewer numbers, with an average of 630 million hours viewed.
• **Global availability matters**: All top-performing shows except La Reina del Sur are available globally, suggesting that global accessibility contributes to their success.
• **Drama and fantasy genres prevail**: The Night Agent, Ginny & Georgia, King the Land, and Queen Charlotte all belong to these genres, which tend to attract large audiences on Netflix.
• **Peak viewership around holidays and summer**: Shows released during holiday seasons (Ginny & Georgia) or summer (King the Land, Queen Charlotte) tend to perform well, possibly due to increased viewer engagement during breaks.
**Key Insights:**
• A strong correlation exists between new releases and high viewership numbers.
This is the most important lesson in this tutorial: don’t use LLMs to sort data or do math. They are designed to generate text, not to act as calculators.
Instead, use the Hybrid Approach:
- Use Pandas (deterministic logic) to confidently find the top 10 shows.
- Convert that filtered slice of data into text.
- Feed only the relevant context to the LLM to do what it does best: pattern recognition, summarization, and qualitative analysis.
When you run this script, your terminal will show the raw data, sort it instantly with Pandas, and in a few seconds, Llama 3 will give you five short bullet points explaining why those Netflix shows did so well.
Closing Thoughts
AI engineering is really just data engineering with a probabilistic API at the end. Don’t expect the model to handle everything. The key to building reliable AI tools at work isn’t about finding a smarter model, but about creating a smarter pipeline.
When you connect your LLM to Google Sheets and use Python to clean and manage the data, you use fewer tokens, avoid mistakes with math or sorting, and create a system that reliably solves real business problems.
I hope you found this article on connecting your LLM to Google Sheets helpful.
For more AI and machine learning tips, follow me on Instagram. My book, Hands-On GenAI, LLMs & AI Agents, can also help you grow your AI career.





