Neural Networks Guide

Building neural networks involves several key steps and decisions based on the task at hand and the nature of the data. If you are someone who has used a neural network architecture before to solve a problem but didn’t understand how it worked, this article is for you. In this article, I’ll take you through a complete guide on how to build and understand neural networks.

To get started with this Neural Networks guide, make sure you have TensorFlow or PyTorch installed in your Python virtual environment for neural network development. Here are the necessary Python libraries I will be using in this neural networks guide:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, LSTM, SimpleRNN, Embedding

We will be understanding the most commonly used neural networks, which include:

MLP
CNN
LSTM

So, I’ll now create sample datasets for each type of neural network mentioned above according to the problems they are used for:

X_mlp = np.random.rand(1000, 10)  # 1000 samples, 10 features
y_mlp = np.random.randint(2, size=(1000,))  # Binary classification labels

X_cnn = np.random.rand(1000, 28, 28, 3)  # 1000 RGB images of size 28x28
y_cnn = np.random.randint(10, size=(1000,))  # 10 classes for image classification

X_rnn = np.random.randint(1000, size=(1000, 10, 1))  # 1000 sequences of length 10
y_rnn = np.random.randint(2, size=(1000,))  # Binary classification labels

Building and Understanding Neural Networks

MLPs

Here’s how to build MLPs:

model_mlp = Sequential([
    Dense(64, activation='relu', input_shape=(10,)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

model_mlp.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

model_mlp.fit(X_mlp, y_mlp, epochs=10, batch_size=32)

Here we’re using the Sequential model API to create a linear stack of layers. In the above code, the first layer specifies input_shape=(10,), indicating that it expects input data with 10 features. It has 64 units/neurons with an activation function: ReLU (Rectified Linear Unit). ReLU is commonly used in hidden layers to introduce non-linearity.

The second layer has 32 units/neurons with an activation function: ReLU. The output layer has 1 unit because it’s a binary classification problem with an activation function: Sigmoid. The sigmoid function squashes the output between 0 and 1, which is suitable for binary classification problems.

Keep these points in mind while solving a problem with MLP:

While working on different problems, you can add more layers or units to increase the model’s capacity if you have a more complex problem or dataset.
Depending on the problem and data, you should choose different activation functions. For example, for regression problems, linear activation might be more suitable in the output layer.
Change the loss function according to the nature of your problem. For example, for regression tasks, Mean Squared Error (MSE) loss might be more appropriate.
Adjust the batch size and number of epochs based on the size of your dataset and computational resources. Larger datasets might require smaller batch sizes to fit into memory, and training might need more epochs to converge.

CNNs

Here’s how to build CNNs:

model_cnn = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 3)),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

model_cnn.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

model_cnn.fit(X_cnn, y_cnn, epochs=10, batch_size=32)

The first layer is a convolutional layer specified by the Conv2D function. It applies a set of filters to the input data to detect features. 32 represents the number of filters, and (3, 3) signifies the filter size. We have used an activation function: ReLU (Rectified Linear Unit). ReLU introduces non-linearity to the model, aiding in learning complex patterns in the data. Input shape: (28, 28, 3) indicates that the input images are 28×28 pixels with 3 channels (RGB).

Next, the pooling layer is used to reduce the spatial dimensions of the feature maps obtained from the convolutional layers, thus decreasing the computational complexity. Here (2, 2) represents the pool size for max pooling operation, which reduces each spatial dimension by half.

Next, the flatten layer is used to convert the multi-dimensional feature maps into a one-dimensional vector, preparing the data for the fully connected layers. It is necessary because dense layers require one-dimensional input.

After flattening, there are two dense layers. The first dense layer has 64 units with ReLU activation, which allows the model to learn complex patterns in the flattened feature vectors. The second dense layer has 10 units, which corresponds to the number of classes in the classification task. We have used an activation function: Softmax. The softmax activation function produces a probability distribution over the classes, making it suitable for multi-class classification.

Keep these points in mind while solving a problem with CNNs:

Adjust the filter size and number of filters in the convolutional layers based on the complexity of the features in the input data.
Experiment with different pooling operations (e.g., average pooling) and pool sizes to balance computational complexity and information retention.
Add more dense layers or adjust the number of units in the dense layers based on the complexity of the classification problem.

LSTMs

Here’s how to build LSTMs:

model_lstm = Sequential([
    LSTM(64, input_shape=(10, 1)),
    Dense(1, activation='sigmoid')
])

model_lstm.compile(optimizer='adam',
                   loss='binary_crossentropy',
                   metrics=['accuracy'])

model_lstm.fit(X_rnn, y_rnn, epochs=10, batch_size=32)

The LSTM layer is a recurrent neural network (RNN) layer with memory cells that allow the model to retain information over time. 64 represents the number of memory units or neurons in the LSTM layer. input_shape=(10, 1) specifies the shape of the input data. Here, 10 is the sequence length, and 1 is the number of features per time step. Input data is expected to be in the shape (batch_size, sequence_length, num_features).

Following the LSTM layer, there is a dense layer with a single neuron with an activation function: Sigmoid. Sigmoid activation is commonly used in binary classification tasks to produce probabilities.

Keep these points in mind while solving a problem with LSTMs:

Adjust the number of memory units in the LSTM layer based on the complexity of the temporal patterns in the data. More complex patterns may require more units.
Modify the sequence length in the input shape based on the temporal context required for the problem. Longer sequences can capture more temporal dependencies but may increase computational complexity.
Experiment with different activation functions in the dense layer based on the problem requirements. For example, softmax activation can be used for multi-class classification tasks.
You can add additional LSTM layers or other types of recurrent layers (such as GRU) to increase the model’s capacity and capture more complex temporal patterns.

I hope you now have understood how to build neural network architectures.

Summary

Building neural networks involves several key steps and decisions based on the task at hand and the nature of the data. If you are someone who has used a neural network architecture before to solve a problem but didn’t understand how it worked, I hope this article helped understand more about it.

I hope you liked this article on a guide to neural networks. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.