Understanding Facial Recognition

Facial recognition technology has become an integral part of our daily lives due to its simplicity and practicality. We use it to unlock our phones, access mobile banking apps, and even secure our homes. In this article, we’ll explore the different ways facial recognition is used, how it works, and the challenges it faces. We’ll also take a closer look at FaceNet, a convolutional neural network (CNN) algorithm developed by Google in 2015. At last we’ll explore the fascinating connection the technology has to neuroscience.

Authors: Matteo Mello, Riccardo Scibetta

Applications

Many applications belong to the realm of cybersecurity and identity verification, because facial recognition is more secure than one-time passwords or two-factor authentications, as it does not involve the use of passwords that can be compromised by hackers. Just to mention a few: smartphones and tablets, banking systems, airport security and border control.

Further, it can be used by local authorities like police and secret services to recognize suspects. This has often raised questions on the ethics of its extensive application on the general public. Countries adopted various approaches, for example Belgium and Luxembourg have banned it, Italy implemented strict policies restricting its use to criminal investigations, China, Russia and Argentina use it extensively and with invasive outcomes.

Dozens of other applications have been developed, from systems recognizing problem gamblers at slot machines to dating sites matching people with compatible facial features, some even predict that the future of ads is facial recognition based, with dynamic ads that adjust to appeal to a person’s interests the moment they notice the ad.

How it Works

Computer vision automates the extraction, analysis, classification, and understanding of useful information from image data. Image data can take various forms, including single images, video sequences and three-dimensional data.

Subsequently, the facial recognition system maps and interprets the shape of the face and facial expressions, identifying the key elements on the face that distinguish it from other objects. In general, facial recognition technology analyzes the following elements:

  • Distance between the eyes
  • Distance between the forehead and chin
  • Distance between the nose and mouth
  • Depth of the eye socket
  • Shape of the cheekbones
  • Contour of the lips, ears, and chin
[Image from MIT News: https://news.mit.edu/2022/optimized-solution-face-recognition-0406 ]

Later, the system converts facial recognition data into a series of numbers or points that form a facial imprint. Each person has a unique facial imprint, similar to fingerprints.

Challenges

Many challenges and complications are involved in applying facial recognition in real environments:

  • Subtle changes in lighting conditions can pose challenges for automated facial recognition algorithms, potentially distorting results even when the person’s pose and expression remain similar. Sometimes, two images of the same face under different lighting appear more distinct than two different faces under the same lighting.
  • Facial recognition algorithms are sensitive to angles and poses. Changes in head movements or camera positions can alter facial appearance, affecting recognition accuracy. For instance, if a database lacks diverse angles, recognition may fail for faces with higher rotation angles.
  • Facial expressions, from macro (happy, sad, angry) to micro (rapid facial movements), further complicate recognition, as emotional states influence expressions. Additionally, makeup and accessories like glasses can hinder recognition.
  • Resolution matters. Low-resolution images like common CCTV footage lack detail and hinder accurate analysis. The minimal image size requirements for effective analysis typically exceed 50×50 pixels.

Algorithm

One of the most popular algorithms used for face recognition is FaceNet, developed by Google in 2015. It’s a technology based on a convolutional neural network (CNN) that enables face verification, recognition, and clustering. The essence of the FaceNet algorithm lies in creating a Euclidean space where all images are embedded. The distance between two images indicates their similarity: the smaller the distance, the greater the similarity. By defining a threshold \(d\), all images with distances less than \(d\) are considered of the same identity. The most challenging aspect is finding the correct embedding function that assigns a vector  in the Euclidean space to each image. This embedding must ensure that images of the same identity have small distances while those of different identities have large distances. FaceNet learns mappings from images and creates embeddings directly, rather than using an additional layer for recognition or verification.

To find the embedding, a loss function is defined and then minimized using Stochastic Gradient Descent (SGD) to find the correct embedding.

Loss Function

In order to understand the choice of the loss function, let’s analyze what we aim to find. The goal of the algorithm is that similar faces have short distances, while different ones have large distances.
We define the following parameters:
\(f(x_i^a)\) = anchor face, that is a face that we use as a reference
\(f(x_i^p)\) = a face similar to the anchor one
\(f(x_i^n)\) = A face different from the anchor one
\(\alpha\) = the distance threshold under which we say that two faces are of the same person
Therefore it must hold that:
$$\|f(x_i^a)-f(x_i^p)\|_2^2+\alpha<\|f(x_i^a)-f(x_i^n)\|_2^2,\quad\forall(f(x_i^a),f(x_i^p),f(x_i^n))\in\mathcal T.$$
The loss function will be:
$$\sum_i^N\left[\|f(x_i^a)-f(x_i^p)\|^2_2-\|f(x_i^a)-f(x_i^n)\|_2^2+\alpha\right]_+$$
Therefore, in order to train the model we need a set of triples \(\{f(x_i^a),f(x_i^p),f(x_i^n)\}\).

Choice of Triples

The choice of the triples to be used is crucial. The best picks of triples have been revealed to be the ones with the greatest distance between \(f(x_i^a)\) and \(f(x_i^p)\) and the ones with the smallest distances between \(f(x_i^a)\) and \(f(x_i^n)\) However, this could result in having too few data or to have biases in the model since most of the \(x_i^p\) with great distances from the anchor are usually poorly imaged faces. The two most important techniques employed in order to solve this problem are:

  1. Generating triplets offline every n step of the training
    1. In this approach, triplets are pre-generated every n steps of training, typically using all available data
  2. Generating triplets online from some created mini batches
    1. During each iteration, a mini-batch of samples is selected, and triplets are formed from within this batch. This method adapts to the changing distribution of data and can handle large datasets more efficiently.

These techniques help ensure that the FaceNet model learns robust representations by exposing it to diverse triplets during training. By carefully selecting triplets, the model can generalize well to unseen faces and achieve high accuracy in face recognition tasks.

Face Recognition and Neuroscience

Unknown to many is the profound link between facial recognition technology and neuroscience.
In 2022, at the MIT Institute for Brain Research, scientists investigated the fusiform face area—a specialized region within the brain’s temporal lobe that’s uniquely responsive to human faces. They were curious about why the brain designates separate regions for recognizing faces and objects. To probe this mystery, they trained a Deep Neural Network on a vast dataset of images featuring various objects and faces, with the sole task for the network being able to differentiate between items like bicycles, faces, and pens.

[Fusiform face area, Image from Wikipedia]

As the network refined its ability to identify these images, it began to self-organize in a manner strikingly similar to the human brain. Initial layers of the network processed visual information broadly, while the deeper layers evolved to focus specifically on faces. Intriguingly, the network achieved this specialized organization without explicit instructions, suggesting that both the artificial network and the human brain naturally evolve to optimize their processing efficiency. This study not only sheds light on the intrinsic connections between artificial intelligence and human cognition but also enhances our understanding of brain architecture itself.

[The whole article is available on Science, here]

[ Three networks with VGG16 architecture (left) were optimized, one on face identity categorization (Face CNN in red), one on object categorization (Object CNN in orange), and one on both tasks simultaneously (dual-task CNN in gray) ]

Conclusion

In conclusion, facial recognition technology has seamlessly integrated into our daily routines, offering convenience and enhanced security for tasks like unlocking phones and banking. Its fascinating link to neuroscience, highlighted by MIT research, reveals intriguing parallels between AI and human brain functions.

Advanced algorithms like Google’s FaceNet have improved accuracy, yet they must continue to address biases and perform well under diverse conditions. As facial recognition evolves, it’s crucial to balance its potential benefits with ethical considerations, ensuring that facial recognition enhances our lives without compromising personal freedoms.

Sources:

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top
Close