Creating a Face Swapping App Using Computer Vision Algorithms

How does Python implement face swapping? This post is the first of 3 articles in which we will build a mobile app that will automate the face swapping.

Part 1: Computer Vision Algorithms
Part 2: Rest API – TBD
Part 3: Mobile App – TBD

Computer vision has proven to be one of the most revolutionary fields in computer science because it provides solutions to some of the world’s most challenging problems.

It also provides innovative technologies that make a positive contribution to the advancement of technology worldwide. We can say that computer vision is the closest we get to interacting with the physical world in the digital world.

It enables self-driving cars to understand and interpret their surroundings. Computer vision also plays a leading role in augmented reality and virtual reality technologies, and many modern applications of AI are also based on several concepts of computer vision.

This allows AI applications to outperform most classical methods.

Thus, facial recognition is another application of computer vision that proposes many solutions to some of the existing solutions, whether it be surveillance or crime and theft prevention.

Facial recognition is basically the use of a camera to recognize a person’s face. A popular example of a facial recognition system is face lock in mobile phones. It uses a face recognition algorithm for detection, and then compares the detected face with the face in the database for screen lock, and automatically unlocks the phone if it matches.

How to build a face swap app? Face swapping is another application of computer vision, which is also based on facial recognition. This article will cover the basics of face swapping in computer vision and how to do it in OpenCV, including detailed face swap program implementation examples. The article will follow the following structure:

What is Face Swapping?
How do I make a face swap?
Landmark Detection Guidelines
What is Dlib?
The implementations are OpenCV and Dlib
Conclusion

Table of Contents

What is Face Swapping?

Most of you have probably seen filters that allow you to swap the faces of two people in a photo, or swap faces with celebrities or animals.

Many people encounter these types of filters on social media and apps like Snapchat.

This is a fairly simple concept in theory, but not in practice. The human eye is very effective at facial recognition and can easily detect whether a face is real or fake.

It can also easily detect facial borders and facial features such as eyes and nose. But getting a computer to interpret all this information isn’t as easy as a human.

However, the concept of face swapping is not as difficult compared to other applications of computer vision.

How do I make a face swap?

Since the advent of advanced computer vision technology, there have been several attempts to develop facial swapping technology. Since then, many software, such as Adobe, have integrated tools to perform face swapping into their Photoshop software.

Similarly, people in the field of artificial intelligence have developed models that enable face swapping. An example of this is the deep learning model implemented by DeepFacLab to implement face swapping, which provides an integrated framework to perform face swapping between images.

How does Python implement face swapping? In addition to this, the method used for face swapping in dlib is also based on a machine learning-based solution. It uses ML models for landmark detection in pictures, which helps with facial recognition. This facial recognition, which uses landmark detection, can be extended to perform face swapping between images. To sum up, the face swap methods include:

Classic Photoshop in Adobe software.
AI-based approach
- Deep Face Lab
- Use dlib for landmark detection.

Landmark detection provides the facial recognition part of the exchange process. After recognition, we extract the face from one image by removing the part except the face, and in the second picture, we delete the face.

We then place the face extracted from one image on the image where the face was deleted and place it where the deletion occurred. This obviously does not provide good results, as the contours of the new face will not match or blend with the new image. Therefore, we have to smooth the boundaries of the faces and do a little processing on the two images at the boundaries so that the new faces blend well with the new images and look more realistic.

Landmark Detection Guidelines

Landmark detection is a computer vision technology where we detect key points from human faces and track them.

How to build a face swap app? While landmark detection in computer vision is related to faces, landmark detection is generally not limited to faces. It refers to detecting and identifying objects in an image by creating a bounding box around them. So, such an algorithm usually outputs two things for us:

The probability of the current object.
In the presence of an object, the coordinates of the bounding box that surrounds it.

Typically, key points in an image that need to be detected by a neural network are called landmarks. These landmarks don’t have to be 4 points to create a bounding box, but they can be counted in any way depending on the application.

In such an application, we need multiple landmarks to identify an object or surface, and we want the model to output the coordinates (x,y) of the detected landmarks, rather than the previous bounding box.

Let’s look at a concrete example of where we want the neural network model to recognize the two corners of a human lip. This will give each face two landmarks and output four numbers, namely:

(x1, y1)
(x2, y2)

It’s fairly simple to perform. However, if we want our neural network to detect not only the corners of the lip, but also the outer lining of the entire lip and other important signs on the eyes, nose and face, the number of dots (signs) will increase to.n

(x1, y1), (x2, y2), (x3, y3) …...... (xn,yn)

In order to perform such a wide range of operations using a neural network, we first need to decide the location of landmarks, and then label the entire training set with the decided landmarks.

If the training dataset is large, this can become a hectic task. The order of the landmarks is also important here, it needs to be consistent throughout the dataset, for example, if you place the first landmark on the tip of the nose, then all images should have the first landmark on the tip of the nose. Nose. Again, the entire dataset will be labeled.

After the whole labeling process, the training set is ready, and then we enter it into a neural network to start the training process, preferably a convolutional neural network.

Check out our article on detecting facial features with Python to learn more about this interesting topic.

What is Dlib?

Dlib is a modern C++-based toolkit for developing machine learning algorithms. It also contains tools to develop complex software in C++ to solve real-world problems.

The tool is widely used in industry and academia, and is widely used in embedded system design, mobile phones, robotics, and high-performance large-scale computing environments.

In addition, it is an open-source library that helps to use because anyone can use it for any application they want to build.

Unlike many other open-source projects, Dlib provides extensive and precise documentation for every class and module present in the library so that anyone who uses it can understand it simply.

It also has good unit test coverage as it is regularly tested on MS Windows, Mac OSS, and Linux systems. This library is self-contained, and no other packages are required for the normal execution of its code.

It offers a large number of machine learning and numerical algorithms for development. It also contains deep learning tools, so you can use it to create ML models ranging from regression and support vector machines to deep neural networks.

Other wide range of tools offered by the library include:

Graph Model Interface Algorithms
Image processing
Threading
networking
graphical user interface
Data compression and integrity algorithms
Test

Use OpenCV and Dlib to implement the face swapping program

As always, you can access the full code of this article on Google Colab, which we will describe and explain next.

Face Swap Example Implementation Start: Before we start the code, we need to install Dlib and OpenCV. Use the python pip command to install using:

pip install Dlib
pip install opencv-contrib-python

After installing the library, we need to download the pre-trained landmark detection model that we will use in Dlib.

Now we can start implementing it.

import cv2
import dlib
import numpy as np
from matplotlib import pyplot as plt

# Loading base images and coverting them to grayscale
face = cv2.imread("chris.jpeg")
body = cv2.imread("trump.jpeg")

How to build a face swap app? We start by importing dependencies, i.e. OpenCV, Dlib, matplotlib, and numpy. Matplotlib is used for visual plotting of images. Numpy is used for mathematical calculations. We then use OpenCV’s “imread” function to import two images, one for the face and the other for the body.

face_gray = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)
body_gray = cv2.cvtColor(body, cv2.COLOR_BGR2GRAY)

# Create empty matrices in the images' shapes
height, width = face_gray.shape
mask = np.zeros((height, width), np.uint8)

height, width, channels = body.shape

Example of a face-swapping program implementation: Now, convert an image from a color image to a grayscale image. We then initialize a matrix that is the same size as the shape of the face image we will use. Then we extract the shape of the body image.

# Loading models and predictors of the dlib library to detect landmarks in both faces
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("./shape_predictor_68_face_landmarks.dat")


# Getting landmarks for the face that will be swapped into to the body
rect = detector(face_gray)[0]

# This creates a with 68 pairs of integer values — these values are the (x, y)-coordinates of the facial structures 
landmarks = predictor(face_gray, rect)
landmarks_points = [] 

def get_landmarks(landmarks, landmarks_points):
    for n in range(68):
        x = landmarks.part(n).x
        y = landmarks.part(n).y
        landmarks_points.append((x, y))

get_landmarks(landmarks, landmarks_points)

points = np.array(landmarks_points, np.int32)

How does Python implement face swapping? Now we use the pre-trained face sign detector we downloaded to recognize the face in the first image. As seen in the for loop, we detected 68 landmarks to identify faces in the image. We extract the coordinates of each of the 68 landmarks in the for loop.

convexhull = cv2.convexHull(points) 

face_cp = face.copy()
plt.imshow(cv2.cvtColor((cv2.polylines(face_cp, [convexhull], True, (255,255,255), 3)), cv2.COLOR_BGR2RGB))

face_image_1 = cv2.bitwise_and(face, face, mask=mask)

Next, we use the convex hull module of the OpenCV library to draw the contours around the face detected from the landmark detection model.

rect = cv2.boundingRect(convexhull)

subdiv = cv2.Subdiv2D(rect) # Creates an instance of Subdiv2D
subdiv.insert(landmarks_points) # Insert points into subdiv
triangles = subdiv.getTriangleList()
triangles = np.array(triangles, dtype=np.int32)

indexes_triangles = []
face_cp = face.copy()

def get_index(arr):
    index = 0
    if arr[0]:
        index = arr[0][0]
    return index

for triangle in triangles :

    # Gets the vertex of the triangle
    pt1 = (triangle[0], triangle[1])
    pt2 = (triangle[2], triangle[3])
    pt3 = (triangle[4], triangle[5])
    
    # Draws a line for each side of the triangle
    cv2.line(face_cp, pt1, pt2, (255, 255, 255), 3,  0)
    cv2.line(face_cp, pt2, pt3, (255, 255, 255), 3,  0)
    cv2.line(face_cp, pt3, pt1, (255, 255, 255), 3,  0)

    index_pt1 = np.where((points == pt1).all(axis=1))
    index_pt1 = get_index(index_pt1)
    index_pt2 = np.where((points == pt2).all(axis=1))
    index_pt2 = get_index(index_pt2)
    index_pt3 = np.where((points == pt3).all(axis=1))
    index_pt3 = get_index(index_pt3)

    # Saves coordinates if the triangle exists and has 3 vertices
    if index_pt1 is not None and index_pt2 is not None and index_pt3 is not None:
        vertices = [index_pt1, index_pt2, index_pt3]
        indexes_triangles.append(vertices)

# Draw delaunay triangles
plt.imshow(cv2.cvtColor(face_cp, cv2.COLOR_BGR2RGB))

Here, we create triangles from each landmark to the detected features.

# Getting landmarks for the face that will have the first one swapped into
rect2 = detector(body_gray)[0]

# This creates a with 68 pairs of integer values — these values are the (x, y)-coordinates of the facial structures 
landmarks_2 = predictor(body_gray, rect2)
landmarks_points2 = []

# Uses the function declared previously to get a list of the landmark coordinates
get_landmarks(landmarks_2, landmarks_points2)

# Generates a convex hull for the second person
points2 = np.array(landmarks_points2, np.int32)
convexhull2 = cv2.convexHull(points2)

body_cp = body.copy()
plt.imshow(cv2.cvtColor((cv2.polylines(body_cp, [convexhull2], True, (255,255,255), 3)), cv2.COLOR_BGR2RGB))

Now, we perform the same facial recognition method for the second image as for the first.

lines_space_new_face = np.zeros((height, width, channels), np.uint8)
body_new_face = np.zeros((height, width, channels), np.uint8)

height, width = face_gray.shape
lines_space_mask = np.zeros((height, width), np.uint8)


for triangle in indexes_triangles:

    # Coordinates of the first person's delaunay triangles
    pt1 = landmarks_points[triangle[0]]
    pt2 = landmarks_points[triangle[1]]
    pt3 = landmarks_points[triangle[2]]

    # Gets the delaunay triangles
    (x, y, widht, height) = cv2.boundingRect(np.array([pt1, pt2, pt3], np.int32))
    cropped_triangle = face[y: y+height, x: x+widht]
    cropped_mask = np.zeros((height, widht), np.uint8)

    # Fills triangle to generate the mask
    points = np.array([[pt1[0]-x, pt1[1]-y], [pt2[0]-x, pt2[1]-y], [pt3[0]-x, pt3[1]-y]], np.int32)
    cv2.fillConvexPoly(cropped_mask, points, 255)

    # Draws lines for the triangles
    cv2.line(lines_space_mask, pt1, pt2, 255)
    cv2.line(lines_space_mask, pt2, pt3, 255)
    cv2.line(lines_space_mask, pt1, pt3, 255)

    lines_space = cv2.bitwise_and(face, face, mask=lines_space_mask)

    # Calculates the delaunay triangles of the second person's face

    # Coordinates of the first person's delaunay triangles
    pt1 = landmarks_points2[triangle[0]]
    pt2 = landmarks_points2[triangle[1]]
    pt3 = landmarks_points2[triangle[2]]

    # Gets the delaunay triangles
    (x, y, widht, height) = cv2.boundingRect(np.array([pt1, pt2, pt3], np.int32))
    cropped_mask2 = np.zeros((height,widht), np.uint8)

    # Fills triangle to generate the mask
    points2 = np.array([[pt1[0]-x, pt1[1]-y], [pt2[0]-x, pt2[1]-y], [pt3[0]-x, pt3[1]-y]], np.int32)
    cv2.fillConvexPoly(cropped_mask2, points2, 255)

    # Deforms the triangles to fit the subject's face : https://docs.opencv.org/3.4/d4/d61/tutorial_warp_affine.html
    points =  np.float32(points)
    points2 = np.float32(points2)
    M = cv2.getAffineTransform(points, points2)  # Warps the content of the first triangle to fit in the second one
    dist_triangle = cv2.warpAffine(cropped_triangle, M, (widht, height))
    dist_triangle = cv2.bitwise_and(dist_triangle, dist_triangle, mask=cropped_mask2)

    # Joins all the distorted triangles to make the face mask to fit in the second person's features
    body_new_face_rect_area = body_new_face[y: y+height, x: x+widht]
    body_new_face_rect_area_gray = cv2.cvtColor(body_new_face_rect_area, cv2.COLOR_BGR2GRAY)

    # Creates a mask
    masked_triangle = cv2.threshold(body_new_face_rect_area_gray, 1, 255, cv2.THRESH_BINARY_INV)
    dist_triangle = cv2.bitwise_and(dist_triangle, dist_triangle, mask=masked_triangle[1])

    # Adds the piece to the face mask
    body_new_face_rect_area = cv2.add(body_new_face_rect_area, dist_triangle)
    body_new_face[y: y+height, x: x+widht] = body_new_face_rect_area
  
plt.imshow(cv2.cvtColor(body_new_face, cv2.COLOR_BGR2RGB))

How does Python implement face swapping? Here, we convert the face in the first image to the orientation and size of the face to fit the face in the second image.

body_face_mask = np.zeros_like(body_gray)
body_head_mask = cv2.fillConvexPoly(body_face_mask, convexhull2, 255)
body_face_mask = cv2.bitwise_not(body_head_mask)

body_maskless = cv2.bitwise_and(body, body, mask=body_face_mask)
result = cv2.add(body_maskless, body_new_face)

plt.imshow(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))

Example of a face-swapping implementation: We now replace the face in image 2 with the converted face in image 1.

# Gets the center of the face for the body
(x, y, widht, height) = cv2.boundingRect(convexhull2)
center_face2 = (int((x+x+widht)/2), int((y+y+height)/2))

seamlessclone = cv2.seamlessClone(result, body, body_head_mask, center_face2, cv2.NORMAL_CLONE)

plt.imshow(cv2.cvtColor(seamlessclone, cv2.COLOR_BGR2RGB))

cv2.imwrite("./result.png", seamlessclone)

Finally, we used OpenCV’s seamlessClone module to merge the boundaries of the face with the new body.

Conclusion

Computer vision is a broad field with a large number of applications. Face swapping is one of them.

How to build a face swap app? Modern computer vision based on neural networks and deep learning is the most effective way to develop face-swapping models. Dlib and OpenCV can be used to create face swap models.

Dlib is a C++-based extension library that can be used to develop machine learning models. It provides a pre-trained model for landmark detection for facial recognition. This demonstrates the potential of deep learning-based computer vision solutions.

If you liked this article and want to learn more about computer vision applications, check out our other research:

Use OpenCV for object tracking
Use AI to remove the background
Use deep learning to generate images
Detect facial features with Python

Artificial Intelligence