Your First Deep Learning Project in Python with Keras: A Step-by-Step Guide

Deep learning is one of the most popular areas of programming in recent years. It’s getting harder and harder to find a field that doesn’t use this new buzzword for a purpose.

Speech and image recognition. Natural language processing. Big data. All of these areas have made major breakthroughs with the help of deep learning. Thanks to the power of deep learning, things that were just fantasy just a few years ago, such as self-driving cars, are now almost a reality.

So we think now is the ideal time for developers to get into this space, even if you’re just experimenting. Keras is one of the most powerful deep learning libraries in Python, making it easier for anyone to take advantage of this technology without having to worry about complex underlying theories.

How to use Keras? In this tutorial, we’ll give you a brief introduction to deep learning, and then jump right into the process of building DL models with Keras without focusing too much on the theoretical aspects, including detailed deep learning implementation examples.

As you will find out at the end of this tutorial on how to use Keras, Keras is very easy even for beginners. However, if you have some understanding of simple machine learning concepts, then some of the decisions we made in this tutorial will make more sense to you.

What is Deep Learning?

Deep learning is a subset of machine learning. It mimics the structure of the human brain and its neural networks, enabling machines to derive outputs from raw data.

One of the main drawbacks of traditional machine learning is feature extraction. Because traditional ML models can’t handle raw data on their own, we have to extract important features from the data before passing it to the model.

Programmers need to have a deep understanding of the problem domain in order to deduce which features should be extracted and how. Not to mention, human intervention in this process opens up the possibility of not capturing the high-level features that are important in the raw data.

But deep learning eliminates the need for feature extraction. Neural networks in deep learning models can learn to recognize abstract and implicit patterns in raw data and map their impact on specific outputs on their own. In other words, DL combines feature extraction and classification, and performs them in a single model.

This makes deep learning superior to machine learning in understanding hidden patterns and features in raw data. As a result, they provide improved results over ML models, especially as dataset sizes increase.

The architecture of a typical neural network

A typical neural network in a DL model consists of multiple layers. Each node contains a set of nodes with numeric values (e.g., 0.3, 2.45).

The connection between nodes in two adjacent layers is defined by weights. It defines the weight of the value of the previous node in determining the value of the next node (e.g. 1.2, 5). In other words, the weights determine the impact of one node in determining the value of another. The weight values are calculated during the training of the neural network.

The first layer of a neural network is called the input layer. The number of layers of the input layer depends on the size of the input vector. If the size of the input vector is 12, the size of the input layer should also be 12.

The last layer of a neural network is the output layer. When the model performs a classification task, the output layer should contain a node for each possible classification result.

For example, if you use the DL model to recognize the numbers (from 0 to 9) displayed in the input image, it should have 10 nodes in the output layer. When the model makes a prediction on a given input image, each output node gives the probability that the number in the image is the number represented by that node. The number with the highest probability is then considered the final prediction of the model.

All other layers between the input and output layers are called the hidden layers of the neural network. These layers and weights that connect their nodes perform mathematical operations to output the final prediction of the model. A neural network can have multiple hidden layers, depending on what it expects to accomplish.

What is a Convolutional Neural Network (CNN)?

Since we’ll be using Keras to build a simple CNN in this tutorial, let’s try to understand how a CNN differs from a regular neural network before proceeding.

A CNN is a type of neural network that takes an image as input. In other words, when we tweak the properties of a neural network, specifically to better handle the image input and its intrinsic quality, we call it a CNN.

As a result, CNN architectures include some specific types of layers compared to conventional neural networks. Convolutional, pooling, and fully connected layers are a few such examples.

Convolutional layer

Convolutional layers apply filters that use a mathematical operation called convolution to summarize features within a small area of an image. For example, we can define a convolutional layer with a kernel (think of it as a window) that is 3×3 in size. It takes through the image vector and applies convolution to 9 elements in the kernel at a given time. The layer then modifies the input values in the image vector based on the convolution results.

Check this giphy to see how convolutional layers act on the image. 

Pooling layer

The pooling layer is used to reduce the size of the input by removing its redundant data. It makes it difficult to map specific features in the input image to precise locations, increasing the flexibility of the model.

Pooling divides an image into a set of non-overlapping areas and pools their values into a single value with a simple operation, such as finding the maximum, minimum, or average value of a value. Maximum pooling is the most commonly used type of pooling technique.

Flatten

A flattening layer simply flattens a multidimensional vector into a long one-dimensional vector. If you pass a 13×13 vector to the flattened layer, it will output a long vector of size 169.

Fully connected layer

The fully connected layer combines individual features that have been identified by the different nodes in the previous layer to paint a larger picture about the input. Therefore, each node in a full connection should be connected to every node in the layer before it.

Keras Usage Tutorial: What is Keras?

In this tutorial, we’ll use Keras to build a deep learning model. Keras is a deep learning library written in Python. Keras abstracts the complex logic behind deep learning algorithms and simplifies the construction of new models for beginners. Keras allows the use of multiple backends in the implementation, including Tensorflow and Theano.

Install and set up Keras

Before installing Keras, you should have Python (versions 3.6-3.8) and pip installed on your system. Then, simply run the following command to start the installation.

pip install keras

Keras is bundled with the Tensorflow 2 distribution, where you can import it as tensorflow.keras. In this implementation, the backend used is Tensorflow by default.

pip install tensorflow

To set up the project, we just need to import Numpy and related modules from Keras.

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical

How to use Keras: Load a dataset

In this tutorial, we use the MNIST dataset of handwritten numbers. It contains a 28×28 grayscale image of 0-9 handwritten digits. Its test dataset contains 60000 images, while the test set contains 10000 images.

The CNN we built was designed to take an image as input and predict the number it would display from 10 possible outputs.

Since the MNIST dataset is bundled with the Keras distribution, we can load it directly without any extra work.

from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

load_data function loads the training and test datasets. Each dataset contains a set of images (X_train, X_test) and a set of labels (y_train, y_test) of the numbers they display.

If we check the shape of the image data:

print(X_train.shape)  #(60000, 28, 28)
print(X_test.shape) #(10000, 28, 28)

This output confirms that we have 60,000 training images and 10,000 test images at 28×28 size.

We can also draw images to better understand the dataset.

from matplotlib import pyplot as plt
plt.imshow(X_train[1])

Keras Tutorial: Preprocessing image data

Before we can provide images to train and test our deep learning models, we need to normalize and reshape them.

Normalizing image pixel values to between 0 and 1 makes it easier and faster to train your model. In order to normalize without losing data, first, we should convert their type to a 32-bit floating-point number.

X_train = X_train.astype("float32") / 255
X_test = X_test.astype("float32") / 255

The neural network we built with Keras requires a 3D image as input. Since these MNIST images are grayscale images, we must use the expand_dims method to specifically add a 3D with a depth of 1.

X_train = np.expand_dims(X_train, axis=3)
X_test = np.expand_dims(X_test, axis=3)

If we check the new shape of the image dataset after this reshaping step:

print(X_train.shape) #(60000, 28, 28, 1)
print(X_test.shape) #(10000, 28, 28, 1)

We can see how the image now has a new three-dimensionality.

Example of a deep learning implementation: Preprocessing image labels

If we check the shape of the image label variable:

print(y_train.shape) #(60000,)
print(y_test.shape) #(10000,)

You can see that both the test and training label data are stored as one-dimensional arrays.

If we check the values stored in one of the arrays to get a better idea:

print(y_train[1]) #0

As you can see, these tag arrays store the numbers in the image directly. But our neural network has to use 10 nodes in the output layer to identify each number. Therefore, we must encode these label data to represent each number using ten classes. For example, 5 should be encoded as.[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

Keras provides a practical way to perform this task with ease.

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

The shape of the image label dataset now is:

print(y_train.shape) #(60000, 10)
print(y_test.shape) #(10000, 10)

How to use Keras: It’s time to create a deep learning model

The model we are building will consist of 7 layers, including an input layer and an output layer. In fact, deciding the number and type of layers to add to the network depends on a lot of experimentation, experience, and a lot of math.

In this tutorial, we won’t delve into the theory or spend time experimenting. Instead, we’ll adopt the same architecture that is commonly used when building CNNs.

However, you have complete freedom to tweak this architecture and experiment with different layers to see how they ultimately affect the final result.

When building a model with Keras, we must use the Sequential class or the Model class as its basis. The Sequential class we’re going to use allows for a linear stack of layers to be built. Let’s start by creating an instance of this class.

model = Sequential()

The first layer we add to the model will act as an input layer. And the input layer we added is also a 2D convolutional layer.

model.add(Convolution2D(32, kernel_size=(3,3), activation="relu", input_shape=(28, 28, 1)))

Here, we create a convolutional layer that uses 32 3×3 cores to extract features from the input. It uses ReLU as the activation function.

The next layer of our neural network is a pooling layer with a pool size of 2×2.

model.add(MaxPooling2D(pool_size=(2, 2)))

We then add another convolutional layer and a pooling layer to our model. The additional convolutional layer allows our model to be trained to recognize high-level features in the image, while the additional pooling layer increases the flexibility of the model.

model.add(Convolution2D(64, kernel_size=(3,3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

The next step is to flatten the input vector using a flattened layer.

model.add(Flatten())

We need to add a dropout layer to our model to prevent it from overfitting the training set. The dropout layer deliberately ignores or “discards” the output of previous nodes to avoid overfitting. Our dropout tier accepts a dropout rate of 0.5.

model.add(Dropout(0.5))

Finally, we add a fully connected layer to the neural network as an output layer. It uses the softmax activation function to determine the output value.

model.add(Dense(10, activation='softmax'))

That’s it. Our model architecture is now complete. Below, you can see the full architecture of the model in one place.

model = Sequential()

model.add(Convolution2D(32, kernel_size=(3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, kernel_size=(3,3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dropout(0.5))

model.add(Dense(10, activation='softmax'))

We can use the following functions to view a summary of the model.

model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_7 (Conv2D)            (None, 26, 26, 32)        320
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 13, 13, 32)        0
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 11, 11, 64)        18496
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 5, 5, 64)          0
_________________________________________________________________
flatten_4 (Flatten)          (None, 1600)              0
_________________________________________________________________
dropout_4 (Dropout)          (None, 1600)              0
_________________________________________________________________
dense_5 (Dense)              (None, 10)                16010
=================================================================
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0

Keras Usage Tutorial: Compiling and Training Models

Before training the model, we should compile it by passing the optimization function (adam, SGD, etc.), the loss function, and the evaluation metric.

model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])

Now, we can train the model using the Keras fitting method. It will be trained over 10 epochs with a batch size of 128. We also use 10% of the training data to validate the model.

model.fit(X_train, y_train, batch_size=128, epochs=10, validation_split=0.1)
Epoch 1/10
422/422 [==============================] - 21s 49ms/step - loss: 0.3711 - accuracy: 0.8857 - val_loss: 0.0863 - val_accuracy: 0.9770
Epoch 2/10
422/422 [==============================] - 22s 53ms/step - loss: 0.1151 - accuracy: 0.9651 - val_loss: 0.0564 - val_accuracy: 0.9840
Epoch 3/10
422/422 [==============================] - 24s 57ms/step - loss: 0.0852 - accuracy: 0.9740 - val_loss: 0.0487 - val_accuracy: 0.9873
Epoch 4/10
422/422 [==============================] - 23s 53ms/step - loss: 0.0735 - accuracy: 0.9779 - val_loss: 0.0428 - val_accuracy: 0.9893
Epoch 5/10
422/422 [==============================] - 23s 54ms/step - loss: 0.0642 - accuracy: 0.9799 - val_loss: 0.0418 - val_accuracy: 0.9885
Epoch 6/10
422/422 [==============================] - 23s 54ms/step - loss: 0.0577 - accuracy: 0.9823 - val_loss: 0.0374 - val_accuracy: 0.9898
Epoch 7/10
422/422 [==============================] - 26s 61ms/step - loss: 0.0538 - accuracy: 0.9831 - val_loss: 0.0354 - val_accuracy: 0.9907
Epoch 8/10
422/422 [==============================] - 23s 54ms/step - loss: 0.0485 - accuracy: 0.9852 - val_loss: 0.0375 - val_accuracy: 0.9897
Epoch 9/10
422/422 [==============================] - 22s 53ms/step - loss: 0.0468 - accuracy: 0.9852 - val_loss: 0.0344 - val_accuracy: 0.9908
Epoch 10/10
422/422 [==============================] - 25s 60ms/step - loss: 0.0434 - accuracy: 0.9865 - val_loss: 0.0304 - val_accuracy: 0.9912

That’s too easy. Now we have a trained deep learning model that shows more than 99% accuracy on the validation data and can recognize images of handwritten numbers.

Example of a deep learning implementation: Evaluation model

To confirm the accuracy of our model, we can evaluate it using our test dataset.

score = model.evaluate(X_test, y_test, verbose=0)

Now, if we print the evaluation results:

print("accuracy", score[1]) #accuracy 0.9894999861717224
print("loss", score[0]) #loss 0.028314810246229172

Our model is close to 99% accurate on the test dataset. Isn’t it amazing?

Use trained models to make predictions

How to use Keras? Finally, we can use this newly trained deep learning model to make predictions. Let’s take the model’s predictions for the top 20 images in the test dataset and see the results against the original image labels.

predictions = model.predict(X_test[:20])
print("predictions:", np.argmax(predictions, axis=1))
print("labels     :", np.argmax(y_test[:20], axis=1))
predictions: [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]
labels     : [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]

Our model has got all of this right!

Summary of the Keras tutorial

Today, we introduce you to an area of programming that has become very popular in the developer community in recent years. Even though it’s a tough field with a lot of math to talk about, Keras makes the work of neural networks very easy, even for complete beginners. So I hope this article has kept you trying deep learning to build interesting and useful AI models.

Thanks for reading!