Diffusion models are generative models that create new data (like images). At a high level, diffusion models start with pure noise (like a static TV screen) and gradually remove noise step by step until a clear image forms, guided by your text prompt. This process makes them powerful for creative tasks like AI art, image editing, and even video generation. So, if you want to learn AI Image Generation, this article is for you. In this article, I’ll explain AI Image Generation using Diffusion Models with Python.
AI Image Generation using Diffusion Models
First, let’s get the tools we need. We’ll use Hugging Face Diffusers, PyTorch, and a few utility libraries. Here’s how to install them if you are not using Google Colab:
pip install diffusers transformers accelerate torch torchvision safetensors
Here’s a breakdown of all these libraries:
- diffusers: the main library for diffusion models.
- transformers: provides text processing (turning words into embeddings).
- accelerate: helps run the model efficiently on different devices (CPU, GPU, or Apple Silicon).
- torch & torchvision: core deep learning framework powering Stable Diffusion.
- safetensors: safer model weights format.
Step 1: Set Up the Environment
Now, let’s prepare the pipeline for AI Image Generation using Diffusion Models. We’ll write code in a way that works on a GPU, Apple M1/M2 chip, or just a CPU:
import torch
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(
model_id, torch_dtype=torch.float16 if device=="cuda" else torch.float32
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
if device == "cuda":
pipe = pipe.to(device)
pipe.enable_attention_slicing()
else:
pipe.to(device)Here’s what’s happening here:
- We load Stable Diffusion v1.5, a widely used model for text-to-image.
- We detect your hardware, such as CUDA (NVIDIA GPU), MPS (Apple Silicon), or CPU.
- The Euler Ancestral sampler is set, a fast and high-quality sampling method.
- On the GPU, we enable attention slicing to save memory.
If you’re on CPU, it will still work, just slower.
Step 2: Write Your First Prompt
Now comes the prompting part. It means telling the model what to generate:
prompt = "ultra-detailed portrait of a red fox wearing a tiny scarf, cinematic lighting, 35mm" negative_prompt = "blurry, lowres, jpeg artifacts, extra fingers, text, watermark"
The prompt here means what you want in the image. And the negative_prompt means what you don’t wish to (helps avoid weird results). Always be descriptive! Adding details like lighting, style, or lens type can significantly enhance image quality.
Step 3: Generate the Image
Now, let’s create the image with some key parameters:
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=30, # 20–35 is a good range
guidance_scale=7.5, # 5–9 usually
height=512,
width=512,
generator=torch.Generator(device=device).manual_seed(42)
).images[0]
image.save("generated_image.png")Here are some key parameters we used:
- num_inference_steps: how many steps the model takes to denoise the image. More steps = more detail, but slower.
- guidance_scale: how strongly the model follows your text prompt. Lower (3–5) = more creative, sometimes unexpected.
- manual_seed(42): ensures reproducibility. Same seed + prompt = same image. Change the seed for fresh variations.
Here’s the generated image I got as an output:

Final Words
So, with just a few lines of Python, you can turn text into stunning visuals, experiment with styles, and push the limits of imagination. The key is to keep experimenting with prompts, parameters, and samplers until you discover the results you love. I hope you liked this article on AI Image Generation using Diffusion Models. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





