• Null Pointer Club
  • Posts
  • How Netflix Knows What You Like (and How You Can Build That Too)

How Netflix Knows What You Like (and How You Can Build That Too)

A Developer’s Guide to Collaborative Filtering – The Foundation of Smart Recommendations

Have you ever wondered how Netflix knows exactly what show you’ll binge next or how Amazon recommends the perfect product? It’s not magic — it’s recommendation systems, and the secret sauce behind many of them is something called collaborative filtering.

In this issue, we’ll break down what collaborative filtering is, why it’s so effective, and how you can build a simple version from scratch using Python and basic data science concepts.

Start learning AI in 2025

Keeping up with AI is hard – we get it!

That’s why over 1M professionals read Superhuman AI to stay ahead.

  • Get daily AI news, tools, and tutorials

  • Learn new AI skills you can use at work in 3 mins a day

  • Become 10X more productive

What is Collaborative Filtering?

Collaborative filtering is a recommendation technique that uses the preferences and behavior of users to suggest items. The key idea:

People who agreed in the past will likely agree in the future.

There are two main types:

  1. User-based filtering – Recommends items liked by similar users.

  2. Item-based filtering – Recommends items similar to what the user has already liked.

Real-World Examples

  • Netflix: Suggests shows watched by users with similar viewing history.

  • Spotify: Recommends songs based on listening patterns of users with overlapping tastes.

  • YouTube: Suggests videos based on what similar users are clicking on.

Building a Simple User-User Collaborative Filter in Python

Let’s walk through a simple version using Python and pandas.

Step 1: Create a user-item matrix

import pandas as pd

# Sample dataset
data = {
    'User': ['Alice', 'Alice', 'Bob', 'Bob', 'Carol', 'Carol'],
    'Item': ['Inception', 'Titanic', 'Inception', 'Avatar', 'Titanic', 'Avatar'],
    'Rating': [5, 3, 5, 4, 3, 5]
}

df = pd.DataFrame(data)

# Create the pivot table
matrix = df.pivot_table(index='User', columns='Item', values='Rating')

Step 2: Compute similarity (e.g., Pearson correlation)

from sklearn.metrics.pairwise import cosine_similarity

# Fill NaNs with 0s
matrix_filled = matrix.fillna(0)

# Compute cosine similarity
user_similarity = cosine_similarity(matrix_filled)
similarity_df = pd.DataFrame(user_similarity, index=matrix.index, columns=matrix.index)

Step 3: Recommend items

Pick the most similar user, and recommend their top-rated items that the current user hasn’t interacted with.

def get_recommendations(user, matrix, similarity_df):
    similar_user = similarity_df[user].drop(user).idxmax()
    
    user_items = matrix.loc[user].dropna().index
    similar_user_ratings = matrix.loc[similar_user]
    
    recommendations = similar_user_ratings.drop(user_items).sort_values(ascending=False)
    return recommendations

print(get_recommendations('Alice', matrix, similarity_df))

Pros & Pitfalls

Pros:

  • Doesn’t require deep domain knowledge or item metadata.

  • Works well with many users and varied content.

Pitfalls:

  • Cold Start Problem: Hard to recommend for new users or items with no interactions.

  • Sparsity: Too many missing values in large datasets.

  • Scalability: Gets slow with millions of users/items without optimization.

Tools to Scale It Up

If you’re building a real-world system, explore:

  • Surprise: A Python library for recommender systems.

  • LightFM: Hybrid models using collaborative and content-based filtering.

  • Apache Mahout / Spark MLlib: For distributed processing.

Where to Use Collaborative Filtering

  • E-commerce: Product recommendations

  • Media streaming: Shows, songs, or podcast suggestions

  • Education: Courses or resources users may enjoy

  • Social: People you may know, or content you might like

Next Steps

Once you’ve mastered basic collaborative filtering:

  • Try item-based filtering using item-item similarity.

  • Build hybrid systems combining collaborative and content-based methods.

  • Experiment with matrix factorization techniques like SVD.

If you’d like a ready-to-run notebook of this implementation, just reply “SEND REPO” and we’ll deliver it.

Similar Reads

Until next time—build smart, scale smart.
The Nullpointer Club Team

Reply

or to participate.