For years, the idea of truly customizing a Large Language Model felt like a privilege reserved for tech giants with bottomless budgets. The barrier to entry wasn’t just knowledge; it was the astronomical cost of computing power needed to retrain billions of parameters. That’s where a technique called LoRA (Low-Rank Adaptation) comes in. It just made fine-tuning accessible to everyone. In this article, I’ll take you through a step-by-step guide to fine-tuning LLMs with LoRA.
The Billion-Parameter Problem: Why Fine-Tuning Was So Hard
Let’s get straight to the point. An LLM is essentially a massive collection of numbers, called weights or parameters. A model like Llama 3 8B has 8 billion of them. When we “fine-tune” a model, we’re adjusting all of these numbers so the model gets better at a specific task, like writing in a particular brand’s voice.
The traditional approach, full fine-tuning, means updating every single one of those 8 billion parameters. This is why, for a long time, customizing LLMs was out of reach for most of us.
Introducing LoRA
So, how do we get around this? We use a smarter approach from a family of techniques called Parameter-Efficient Fine-Tuning (PEFT). The star of this family is LoRA (Low-Rank Adaptation).
Here’s how LoRA works under the hood, in simple terms:
- Freeze the Original Model: We lock all of the LLM’s billions of original parameters. We don’t train them at all.
- Inject Tiny “Adapter” Layers: LoRA injects a pair of very small, new layers (called matrices A and B) alongside the original ones, especially in the “attention” parts of the model where the most important learning happens.
- Train Only the Adapters: During training, we only update the parameters in these tiny new layers. Instead of training billions of parameters, we might only train a few million (e.g., 0.1% of the total!).
These small matrices are “low-rank,” a fancy linear algebra term for saying they can capture the most important information in a very compressed way. By training them, we’re teaching the model a “delta” or a “change” needed for our new task, without ever touching its original knowledge base.
Fine-Tuning LLMs to Write Positive Reviews using LoRA
Here, our goal is to take a standard GPT2 model and teach it how to start writing like a positive movie critic. We’ll fine-tune it on positive reviews from the famous IMDB dataset.
Step 1: Setting Up the Environment
First, we need our tools. We’ll be using the Hugging Face ecosystem, which makes this process incredibly smooth. Make sure to install these libraries:
!pip install -q transformers datasets peft trl accelerate
We are using:
- transformers & datasets: For loading our base model and the IMDB data.
- peft: The library that contains the functions to use LoRA.
- accelerate: A helper for running this efficiently on our GPU.
Step 2: The “Before” Snapshot – How Does a Base Model Behave?
Let’s see what a fresh-out-of-the-box GPT2 model does with a simple prompt. This is our baseline:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset
# The model we want to fine-tune
model_name = "gpt2"
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Set the padding token if it's not already set
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token # Use the end-of-sequence token as the padding token
# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)
# A prompt to test the model
prompt = "The movie started with a captivating scene that"
# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt")
# Generate a completion
# We're moving the model and inputs to the GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
inputs = inputs.to(device)
# Generate text
generate_ids = model.generate(inputs.input_ids, max_length=50)
response = tokenizer.decode(generate_ids[0], skip_special_tokens=True)
print("--- Base Model Response ---")
print(response)--- Base Model Response ---
The movie started with a captivating scene that was so realistic that I thought it was real. I was so excited to see the movie and I was so excited to see the movie. I was so excited to see the movie.
It’s a bit repetitive and generic. It completes the sentence, but it has no specific style or direction. Let’s fix that.
Step 3: Data Preparation
We need to teach our model what a positive review looks like. We’ll load the IMDB dataset, filter it for only positive reviews, and then format each one into a clear, instructive text string:
# Load the IMDB dataset
dataset = load_dataset("imdb", split="train")
# Filter for only positive reviews (label 1)
positive_reviews = dataset.filter(lambda example: example["label"] == 1)
# To make this demo run quickly, let's just use a small subset of the data
small_dataset = positive_reviews.select(range(500)) # Using 500 examples for speed
# We need to format our examples into a single text string for the SFTTrainer
def format_review(example):
# For this simple task, the text itself is our training data
return {"text": "Review: " + example["text"] + " TL;DR: Positive."}
formatted_dataset = small_dataset.map(format_review)By formatting the data like this (Review: … TL;DR: Positive.), we’re giving the model a clear pattern to learn. It’s like telling a student, “Here’s a piece of text, and the summary is ‘Positive’. Now learn this pattern.”
Step 4: Installing LoRA
Now, we will define our LoraConfig that tells the peft library how and where to inject its tiny adapter layers:
from peft import LoraConfig, get_peft_model
# Create the LoRA configuration
lora_config = LoraConfig(
r=8, # The rank of the update matrices. A small number is usually sufficient.
lora_alpha=16, # A scaling factor. A good rule of thumb is to set this to 2*r.
target_modules=["c_attn"], # The specific layers to adapt. For GPT-2, this is the attention layer.
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM"
)
# Wrap the base model with the PEFT model
peft_model = get_peft_model(model, lora_config)
# Let's see how many parameters we are actually training!
peft_model.print_trainable_parameters()trainable params: 294,912 || all params: 124,734,720 || trainable%: 0.2364
Look at that! We’re now training almost 0.2% of the total parameters. We went from updating 124 million parameters to just 300 thousand. This is why you don’t need a supercomputer.
Step 5: The Training Session
Now, we set up the training process using the standard Hugging Face Trainer. This will feed our formatted examples to the model and update only our small LoRA weights:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
# Safety for training
peft_model.config.use_cache = False
tokenizer.padding_side = "right"
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token
peft_model.config.pad_token_id = tokenizer.pad_token_id
# Tokenize dataset
def tokenize_fn(batch):
return tokenizer(
batch["text"],
truncation=True,
padding="max_length",
max_length=512,
)
tokenized_ds = formatted_dataset.map(
tokenize_fn,
batched=True,
remove_columns=formatted_dataset.column_names,
)
# Causal LM collator (no MLM)
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False,
)
training_args = TrainingArguments(
output_dir="./gpt2-imdb-finetune",
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
learning_rate=2e-4,
num_train_epochs=2,
logging_steps=50,
fp16=True,
bf16=False,
remove_unused_columns=False,
)
trainer = Trainer(
model=peft_model,
args=training_args,
train_dataset=tokenized_ds,
data_collator=data_collator,
)
print("Starting training...")
trainer.train()
print("Training complete!")
Step 6: The “After” Snapshot – Our Specialized Model
The training is done. Let’s give our fine-tuned model the same prompt and see how its personality has changed:
# Let's test the fine-tuned model with the same prompt
print("\n--- Fine-Tuned Model Response ---")
# The trainer wraps the model, so we use trainer.model
fine_tuned_model = trainer.model
# Generate text using the fine-tuned model
generate_ids = fine_tuned_model.generate(inputs.input_ids, max_length=50)
response = tokenizer.decode(generate_ids[0], skip_special_tokens=True)
print(response)--- Fine-Tuned Model Response ---
The movie started with a captivating scene that really pulled me in. The characters were well-developed and the story was compelling. I was on the edge of my seat the whole time. TL;DR: Positive.
See the difference? It’s not just completing the sentence anymore. It’s adopting the style of a positive movie review that we taught it. It learned the pattern and is now applying it to new text.
Final Words
We just walked through a complete example, but the real journey starts now. Think about a problem you care about:
- Could you fine-tune a model to write code documentation in your team’s specific style?
- Could you create a chatbot that responds to customer emails with a cheerful, helpful tone?
- Could you build a tool to summarize dense research papers into simple, easy-to-read paragraphs?
You no longer need a massive budget. You just need a good idea, a decent GPU, and the willingness to learn.
I hope you liked this article on a step-by-step guide to fine-tuning LLMs with LoRA. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





