Image Classification Model with Deep Learning

If you are a beginner in Machine Learning, you must have heard about Deep Learning, which we use for working with larger datasets based on tabular data, textual data, and even audio and images. So, if you want to learn about Deep Learning practically, this article is for you. In this article, I’ll take you through the task of building an Image Classification model using Deep Learning that will help you understand Deep Learning practically.

Introducing the Dataset: Fashion MNIST

Fashion MNIST is a dataset of 70,000 grayscale images of clothing items. Each image is 28×28 pixels and labelled into one of 10 categories, such as T-shirts, Trousers, Sneakers, and more. This dataset contains:

  1. 60,000 images for training
  2. 10,000 images for testing
  3. 10 classes in total

Think of it as a more practical version of the classic handwritten digit MNIST dataset but with clothing items!

Building an Image Classification Model with Deep Learning

We’ll be using TensorFlow + Keras, one of the most beginner-friendly and powerful libraries for deep learning. So, let’s get started by importing the necessary Python libraries:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.metrics import classification_report, confusion_matrix

Now, let’s load the dataset:

fashion_mnist = keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

Let’s take a look at the shape of our data:

print("Training shape:", x_train.shape)
print("Test shape:", x_test.shape)
Training shape: (60000, 28, 28)
Test shape: (10000, 28, 28)

It’s always good to see what your model will be working with. So, let’s visualize a sample of the data:

import plotly.graph_objects as go
import numpy as np
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=5, subplot_titles=[class_names[y_train[i]] for i in range(10)])

for i in range(10):
    row = i // 5 + 1
    col = i % 5 + 1

    fig.add_trace(go.Heatmap(z=x_train[i], colorscale='gray'), row=row, col=col)

    fig.update_xaxes(showticklabels=False, row=row, col=col)
    fig.update_yaxes(showticklabels=False, row=row, col=col)

fig.update_layout(height=600, width=1000, title_text="Sample Images")
fig.show()
visualize a sample of the fashion mnist data

Preprocessing the Data

Before feeding data into a neural network, you need to normalize and reshape it. So, let’s preprocess the data with these steps:

x_train = x_train / 255.0
x_test = x_test / 255.0

x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

Here, we started by normalizing the pixel values using x_train / 255.0. Why? Because the raw pixel values in grayscale images range from 0 to 255, and feeding that scale into a neural network is asking for trouble. Deep learning models train more efficiently when input values are in a small, consistent range, ideally [0, 1]. It speeds up convergence and helps with numerical stability during backpropagation.

Next, we reshaped the input from (28, 28) to (28, 28, 1). Now technically, nothing’s changed visually, we’re still working with grayscale images. But CNNs expect input with three dimensions per image: height, width, and number of channels. For RGB, that would be 3; for grayscale, it’s 1. So, adding that last dimension is just making sure the model knows it’s looking at 2D data with a single channel.

Building a Convolutional Neural Network

CNNs are designed to work with images. They extract patterns like edges, textures, and shapes, making them perfect for tasks like this. So, let’s build a CNN:

def create_model():
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D(2, 2),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D(2, 2),
        layers.Flatten(),
        layers.Dropout(0.3),
        layers.Dense(128, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    return model

model = create_model()
model.summary()
Building a Convolutional Neural Network for Image Classification with deep learning

Here, we defined a straightforward but solid CNN architecture using Keras’s Sequential API. The model starts with two convolutional blocks, first with 32 filters, then 64, each followed by max pooling to downsample spatial dimensions and reduce computation. After flattening the output, we added a dropout layer to prevent overfitting by randomly deactivating 30% of the neurons during training.

Then we have a fully connected dense layer with 128 ReLU units to learn non-linear patterns, and finally, a softmax output layer with 10 units for multiclass classification (one for each clothing category). We compiled the model using the Adam optimizer for adaptive learning, sparse_categorical_crossentropy since we’re working with integer labels, and accuracy as our evaluation metric, pretty standard for image classification.

Now, let’s add callbacks. Callbacks like EarlyStopping and ModelCheckpoint make training smarter by stopping when the model stops improving and saving the best version of it:

early_stop = keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True)
checkpoint = keras.callbacks.ModelCheckpoint("best_model.h5", save_best_only=True)

Next, we will train the model:

history = model.fit(
    x_train, y_train,
    epochs=20,
    batch_size=64,
    validation_split=0.2,
    callbacks=[early_stop, checkpoint]
)

Here, your model is learning how to classify clothes!

Evaluating the Model

Let’s test how well the model performs on unseen data:

test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"\nTest Accuracy: {test_acc:.4f}")
313/313 ━━━━━━━━━━━━━━━━━━━━ 5s 16ms/step - accuracy: 0.9066 - loss: 0.2523

Test Accuracy: 0.9084

The output tells us that the model achieved an accuracy of around 90.8%, which means it correctly classified about 9 out of every 10 test images,  pretty solid for a baseline CNN on Fashion MNIST. The loss value of 0.2523 reflects how confident or calibrated the model’s predictions were; lower is better.

Now, let’s have a look at the classification report:

y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)

print(classification_report(y_test, y_pred_classes, target_names=class_names))
classification report

The overall accuracy sits at 91%, which aligns with what we saw earlier, but this view gives more nuance. Classes like Trousers, Sandals, Bag, and Ankle Boots performed exceptionally well, with precision, recall, and F1-scores close to or above 0.98, meaning the model is both accurate and consistent for these categories. However, Shirt stands out as the weakest, with an F1-score of just 0.72, likely due to visual similarity with T-shirts, Pullovers, and Coats, which often confuses models in this dataset. The macro and weighted averages confirm balanced performance across classes.

Next, you can work on a real-world project to solve a problem using Deep Learning. Here are some projects you should try:

  1. Fashion Recommendations using Image Features
  2. Next Word Prediction Model

Summary

So, this is how to build an image classification model using Deep Learning with CNNs and TensorFlow. Here, you learned how to:

  1. Load and understand real-world image data
  2. Preprocess it for deep learning
  3. Build and train a CNN using TensorFlow
  4. Evaluate model performance

I hope you liked this article on building an image classification model with deep learning. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.

Aman Kharwal
Aman Kharwal

AI/ML Engineer | Published Author. My aim is to decode data science for the real world in the most simple words.

Articles: 2073

Leave a Reply

Discover more from AmanXai by Aman Kharwal

Subscribe now to keep reading and get access to the full archive.

Continue reading