Unveiling Netflix’s AI: From Data to Personalized Recommendations
Authors: Matteo Mello Grand, Riccardo Scibetta
Netflix is the leading streaming platform worldwide, commanding a market capitalization of nearly a quarter of a trillion dollars and counting more than 260 million subscribers. Netflix, however, is not only an entertainment company. In fact, 75% of the content that users consume on the platform is recommended to them by an algorithm. For this reason, AI plays a pivotal role in the company’s success.
In this article we will present a general overview of the evolution and current state of the recommendation system, followed by a more technical explanation of the title ranking algorithm.
Netflix and Recommendation Systems
The history of the company with machine learning is very long, and dates back to 2005 when Netflix was still a DVD renting company, collecting just a few data points on article selection and rental times.
Once it became an on-demand online platform, the company announced the Netflix Prize: an open competition with a prize of 1.000.000 $ to award the creation of an optimized algorithm for content recommendation. The competition gained great interest from the sector, but the solution proposed by the winning team ended up not being implemented because of its complexity. (Dataset now available on Kaggle)
Fast forward to today, Netflix is collecting an impressive amount of data. First of all, from its titles, which provide rich metadata coming from the genre, actors, year of publication, reviews online and the critics’ opinions. Second, from the implicit behavior of the users, capturing the duration, intensity and time of the day of each stream. Moreover, Netflix collects millions of explicit ratings on its titles every day, which play a pivotal role in understanding the user’s tastes, as we will see later on.
Extensive Personilaztion
This data is used to provide a 100% personalized user interface, in fact not only the titles suggested change from one user to the other, but also the trailer that is shown, the description and the artwork of the movie. For example, someone interested in watching romantic movies can get interested in Good Will Hunting if they see the artwork of Matt Damon and Minnie Driver. Likewise, someone who is a fan of comedies can be drawn to the movie if the artwork contains Robin Williams.
To choose among the different artworks Netflix employs a Contextual Bandits approach.
To optimize other sections a variety of solutions are employed. For example, feed-forward neural networks are involved in the search function optimization, to keep adjusting the possible results with the user’s past viewing history.
Title Choice and Ranking
The recommendation system determines the way titles are shown on the Netflix interface, prioritizing the most likely to resonate with the user’s tastes and mood. With a mere 60 to 90 seconds window to attract attention, the movies are organized in hierarchical order, with the most likely being shown on top and on the left.
To rank the movies Netflix employs Collaborative Filtering, a technology based on the assumption that individuals which have shown similar preferences in the past will continue to have similar preferences in the future. Hence, by analyzing the films liked by users that have a profile similar to the target, we can predict what are the films that he is most likely to enjoy.
How Collaborative Filtering Works
One of the most efficient collaborative filtering algorithms is called matrix factorization. This technique starts by creating the “matrix of insights”, which is a matrix whose columns represent the movies and whose rows represent the users. Each entry of the matrix, the intersection between a user and a movie, has to be filled with the rating that the user gave to the movie.
As you can notice, not all the entries are filled. The empty entries represent the movies not seen/rated by the users. The goal of the algorithm is to predict the ratings of the not-yet seen movies, and then to suggest to every user the not-yet seen movies with the highest predicted rating.
Rating Normalization
The first step of the algorithm is to normalize the ratings in the matrix of insights. Since different users have different evaluation criteria, then they could rate movies according to different standards. Consider for example the ratings of a movie critic and of a child. Obviously, the movie critic will tend to give lower ratings. In order to delete this divergence of users’ rating criteria, we can normalize the ratings. The simplest way to do it is by subtracting to each row (a user’s ratings) the mean rating of the row. By applying this procedure to the matrix of insights, we should get a new matrix in which the ratings follow a common evaluation criterion.
Once we get the normalized matrix, we are ready to start the core of the algorithm.
Factorization and Stochastic Gradient Descent
The core of the matrix factorization technique consists in factoring the matrix of insights as a product of two lower-rank matrices, which respectively represent the user and item embeddings in a common factor space. Here you can see a representation:
The factors can be decided a priori and could be some movie’s characteristics such as the movie genre, the music, ecc… or just some abstract categories.
The problem that we face is that initially we only know the structure of the matrix of insights and nothing about the two embedding matrices’ entries. How can we find them?
In order to find the entries of the two matrices we start by assigning them some random values and then, by defining a loss function, we apply the Stochastic Gradient descent (SGD) procedure to minimize the loss function. The loss function used in matrix factorization is the square distance:
It computes the sum of the squares of all the differences between the value of an entry in the matrix of insights and the value of the same entry in the matrix product between the two factor matrices. The Stochastic Gradient Descent procedure allows us to find the two embedding matrices that minimize the loss function, and therefore we would finally get two factor matrices whose product gives the matrix of insights, filled with the predicted ratings.
Now, by knowing the predicted ratings of the not yet seen movies, we can suggest to every user the movies with the highest predicted rating, which are the films that the users will most probably like.
Therefore matrix factorization is a collaborative filtering algorithm that efficiently predicts ratings for not yet seen movies, and recommends to users the movies with the highest predicted rating. It is very efficient especially when we need to deal with matrices of insights with very large ranks. For example, by factoring a 2000×1000 matrix into two matrices respectively 2000×100 and 100×1000, instead of storing 2000×1000 =entries we just need to store 2000×100 + 100×1000 = 300 ‘000 entries.
Conclusion
Picking a movie on Netflix might feel like a breeze, but there’s a lot going on behind the scenes to make that happen. The recommendation system isn’t just about matching tastes with collaborative filtering; it’s a complex, state-of-the-art process powered by advanced machine learning. Imagine hundreds of top-notch researchers from around the globe, all putting their minds together to make this system as smart as it can be. We’ve tried to peel back some of the layers to give you a glimpse into this intricate world, hoping to make the complex a bit more understandable.