Connect Your LLM to Google Sheets

A big challenge I often see when teams start using Generative AI is not the model itself, but connecting the model to the data they use daily. Many people want to build smart agents, but most business data is still in spreadsheets. To create useful AI tools, you need to know how to connect your LLM to Google Sheets.

In this tutorial, you’ll learn how to build a pipeline that takes data from a Google Sheet, processes it with Python on your computer, and sends it to a local LLM for automated analysis.

Prerequisites: Setting Up the Environment

Before we start coding, let’s set up two things first:

  1. our Google Cloud credentials
  2. and our local LLM

Many junior engineers get stuck at this step, so let’s go through it together, step by step.

The Google Cloud Setup

To read a Google Sheet with your script, you need to give it an identity. You can do this by creating a Service Account:

  1. Go to the Google Cloud Console.
  2. Create a new project and enable the Google Sheets API and Google Drive API.
  3. Navigate to Credentials -> Create Credentials -> Service Account.
  4. Once created, go to the Service Account’s “Keys” tab and click Add Key -> Create New Key (JSON). Download this file to your machine.
  5. This next step is crucial. Open your JSON file, copy the client_email address, and share your Google Sheet with that email, just as you would with a coworker. If you miss this, you’ll get a 403 Permission Denied error.

The Local LLM Setup

We’ll use a local LLM so everything stays free and private:

  1. Download and install Ollama.
  2. Open your terminal and run: ollama pull llama3.

This command downloads the model and starts a local server on port 11434.

Next, install the required libraries in your environment:

pip install gspread pandas requests google-auth

Connect Your LLM to Google Sheets: A Practical Pipeline

Now let’s start coding. We’ll build this the same way you would set up a real-world data pipeline.

Step 1: Authentication

import gspread
from google.oauth2.service_account import Credentials
import pandas as pd
import requests

SCOPES = [
    "https://www.googleapis.com/auth/spreadsheets",
    "https://www.googleapis.com/auth/drive"   
]

creds = Credentials.from_service_account_file(
    "/Users/amankharwal/gsheet/gen-lang-client-0785384276-8bee5091be3f.json",
    scopes=SCOPES
)

client = gspread.authorize(creds)

Here, you set the API scopes, which control what your script can do, and link the Credentials object to the JSON key you downloaded.

Make sure the file path matches where you saved your JSON file. Once you’re authorized, gspread manages the session for you.

If you want to learn how to build more real-world AI pipelines like this, I’ve covered it step-by-step in my book Hands-On GenAI, LLMs & AI Agents.

Step 2: Load the Google Sheet

SHEET_ID = "1SM-JTztj_RVCM1mLaDdQ9SRp5efFzTV9Jp3G6Q0XdEQ"

sheet = client.open_by_key(SHEET_ID).sheet1

data = sheet.get_all_records()
df = pd.DataFrame(data)

print("Raw Data:")
print(df.head())
Raw Data:
Title ... Hours Viewed
0 The Night Agent: Season 1 ... 812100000
1 Ginny & Georgia: Season 2 ... 665100000
2 The Glory: Season 1 // 더 글로리: 시즌 1 ... 622800000
3 Wednesday: Season 1 ... 507700000
4 Queen Charlotte: A Bridgerton Story ... 503000000

[5 rows x 4 columns]

We use open_by_key() instead of open(“Sheet Name”).

sheet id in google sheet
The key is the long string between /d/ and /edit.

This is a good practice because if someone renames the spreadsheet, your pipeline will still work.

After that, we pull all the records and put them into a Pandas DataFrame.

Step 3: Clean the Data

def clean_hours(x):
    try:
        return int(str(x).replace(",", ""))
    except:
        return 0

df["Hours Viewed"] = df["Hours Viewed"].apply(clean_hours)

# Optional: convert dates
df["Release Date"] = pd.to_datetime(df["Release Date"], errors="coerce")

Real-world data is often messy. In spreadsheets, numbers might be stored as strings with commas, like “1,200,000”. If you send that straight to an LLM or try to sort it in Python, it might fail or sort incorrectly.

To fix this, we always use a simple apply function to turn these into integers.

Step 4: Local LLM Connection

def query_llm(prompt):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "llama3",
            "prompt": prompt,
            "stream": False
        }
    )

    # Basic error handling (important in real workflows)
    if response.status_code != 200:
        raise Exception(f"LLM Error: {response.text}")

    return response.json().get("response", "")

You don’t need big frameworks like LangChain to connect to an LLM. Ollama gives you a simple REST API on your computer, so a regular requests.post call is enough.

This keeps your dependencies minimal and your code easy to read.

Step 5: The Hybrid Analysis Approach

# Always use Python for heavy lifting
top_shows = df.sort_values(by="Hours Viewed", ascending=False).head(10)

text_data = top_shows.to_string(index=False)

prompt = f"""
You are analyzing Netflix viewership data.

Here are the top 10 shows based on hours viewed:

{text_data}

Tasks:
1. Identify patterns in top-performing content
2. Highlight any trends (release timing, global availability, etc.)
3. Keep output concise (5 bullet points max)
"""

result = query_llm(prompt)
LLM Insights:

Based on the Netflix viewership data, here are my findings:

**Patterns and Trends:**

• **New releases perform well**: Shows released in 2023 (The Night Agent, King the Land, Queen Charlotte) have strong viewer numbers, with an average of 630 million hours viewed.
• **Global availability matters**: All top-performing shows except La Reina del Sur are available globally, suggesting that global accessibility contributes to their success.
• **Drama and fantasy genres prevail**: The Night Agent, Ginny & Georgia, King the Land, and Queen Charlotte all belong to these genres, which tend to attract large audiences on Netflix.
• **Peak viewership around holidays and summer**: Shows released during holiday seasons (Ginny & Georgia) or summer (King the Land, Queen Charlotte) tend to perform well, possibly due to increased viewer engagement during breaks.

**Key Insights:**

• A strong correlation exists between new releases and high viewership numbers.

This is the most important lesson in this tutorial: don’t use LLMs to sort data or do math. They are designed to generate text, not to act as calculators.

Instead, use the Hybrid Approach:

  1. Use Pandas (deterministic logic) to confidently find the top 10 shows.
  2. Convert that filtered slice of data into text.
  3. Feed only the relevant context to the LLM to do what it does best: pattern recognition, summarization, and qualitative analysis.

When you run this script, your terminal will show the raw data, sort it instantly with Pandas, and in a few seconds, Llama 3 will give you five short bullet points explaining why those Netflix shows did so well.

Closing Thoughts

AI engineering is really just data engineering with a probabilistic API at the end. Don’t expect the model to handle everything. The key to building reliable AI tools at work isn’t about finding a smarter model, but about creating a smarter pipeline.

When you connect your LLM to Google Sheets and use Python to clean and manage the data, you use fewer tokens, avoid mistakes with math or sorting, and create a system that reliably solves real business problems.

I hope you found this article on connecting your LLM to Google Sheets helpful.

For more AI and machine learning tips, follow me on Instagram. My book, Hands-On GenAI, LLMs & AI Agents, can also help you grow your AI career.

Aman Kharwal
Aman Kharwal

AI/ML Engineer | Published Author. My aim is to decode data science for the real world in the most simple words.

Articles: 2093

Leave a Reply

Discover more from AmanXai by Aman Kharwal

Subscribe now to keep reading and get access to the full archive.

Continue reading