You can code up a logistic regression blindfolded and explain the nuances of transformers in your sleep. But what to do when the interviewer asks you to design a system? It isn’t about code or algorithms, it’s about ML system design. Machine Learning System Design isn’t just about ML; it’s about product thinking, architecture, and trade-offs. So, in this article, I’ll explain how to solve ML system design interview problems.
The Goal in ML System Design Isn’t the Right Answer, It’s the Right Thought Process
In ML system design interview problems, there is no single correct answer. The interviewer isn’t looking for a perfect, production-ready blueprint. They want to see how you think. Can you break down a massive, ambiguous problem into manageable pieces? Can you justify your decisions? Or can you think about scale, users, and business impact?
To do that, you need a reliable framework. Let’s walk through one using a popularly asked problem: Design a YouTube Video Recommendation System.
Step 1: Ask Questions First!
Don’t jump into solutions. Your first instinct should be to clarify. The problem “design a YouTube recommendation system” is intentionally vague. Your job is to narrow it down.
Start asking questions like a consultant:
- Objective: What is the primary goal? To increase user engagement? Maximize watch time? Let’s assume the main goal is to maximize user watch time. It is a crucial decision that will guide all your future choices.
- Scope: Where will these recommendations appear? On the homepage? In the sidebar? As notifications? Let’s focus on the homepage recommendations for a logged-in user.
- Scale: How many users and videos are we talking about? Billions of videos and hundreds of millions of active users. It immediately tells you that efficiency and latency are critical.
- Constraints: Are there any latency requirements? Recommendations should load almost instantly, say, under 200 milliseconds.
By the end of this step, you’ve turned a vague problem into a concrete goal: to design a system that recommends videos on the homepage for logged-in users, maximizing watch time while serving millions concurrently with a latency of under 200 milliseconds.
Step 2: Define Your Metrics
How do you know if your system is working? You need to define success metrics. It’s good practice to separate them into two categories:
- Offline Metrics: These are what you use during training and testing. Examples include Precision, Recall, and Mean Average Precision (MAP). These tell you if your model is good at predicting what a user might like.
- Online Metrics: These are what you measure in the real world through A/B testing. It includes metrics like Click-Through Rate (CTR), Watch Time, and Session Length / Session Depth.
At this step, your answer should be: we’ll train our model to optimize for offline metrics like precision, but we’ll ultimately judge its success with online A/B tests focused on watch time and user session length.
Step 3: Sketch the High-Level Architecture
Now, let’s draw the map before we explore the territory. A typical ML system has a few key components. Don’t get bogged down in details yet; just show you understand the flow.
A great recommendation system often uses a two-stage process:
- Candidate Generation: From billions of videos, quickly select a few hundred potentially relevant candidates for a specific user. The goal here is recall, don’t miss anything good.
- Ranking: Take those few hundred candidates and score them precisely. The goal here is precision; put the absolute best video at the top.
Your high-level diagram might look something like this:
User Action -> Data Collection -> Candidate Generation -> Ranking -> Final Recommendations
This two-stage approach is a classic pattern because it solves the scale problem. It’s computationally impossible to run a complex model on every single video for every user.
Step 4: Talk About Data and Features
This is the heart of any ML system. What data will you use? How will you turn it into signals (features) for your model?
In our example:
- We can talk about data sources like user data, video data (content features), and interaction data (Collaborative Features).
- We can also talk about feature engineering steps like creating embeddings to represent users and video features, and contextual features like time of day and device type.
A great point to add here is the concept of freshness. A video about a news event from yesterday is more relevant than one from last year. This is a critical feature.
Step 5: Choose Your Models
This is where many juniors either oversimplify or get lost in the weeds. The key is to show you understand trade-offs.
For the candidate generation model, we can start with collaborative filtering (as a baseline). It’s simple, powerful, and a great starting point. The idea is that users who watched X also watched Y. If this doesn’t meet the criteria, we can use a two-tower neural network model that learns user and video embeddings separately and finds the closest matches efficiently.
For the ranking model, we can use a simple model like Logistic Regression or a Gradient Boosted Decision Tree (like XGBoost). It’s fast, interpretable, and can handle a lot of features. The model would predict the probability of a click or, even better, the expected watch time. If this doesn’t work as a baseline, we can implement a deep neural network that can capture complex, non-linear interactions between user and video features. This is closer to what YouTube actually uses.
Step 6: Closing the Loop: Serving, Scaling, and Iterating
You’ve designed the system, but how does it run in the real world? Talk about:
- Serving: How do you meet that 200ms latency requirement? Most homepage recommendations can be pre-computed in a batch process overnight and stored in a fast key-value store (like Redis). When a user logs in, you just fetch their pre-computed list. For real-time signals, you can have a smaller, faster model that updates the top of the ranked list.
- A/B Testing: You never deploy a new model to 100% of users. You test it on a small percentage (say, 1%) and compare its online metrics against the old model. This is non-negotiable in any large-scale system.
- Feedback Loop: The system must learn. When a user watches or skips a video, that data should be fed back into the system to retrain and improve the models continuously. This is what makes an ML system learn.
So, this is how you design an ML system for a YouTube Video Recommendation System. I hope this helps you learn to solve ML system design interview problems.
Final Words
Remember these key principles:
- Clarify first, solve second.
- Start simple and iterate. A working baseline is better than a perfect but unimplemented idea.
- It’s all about the trade-offs. Speed vs. accuracy. Complexity vs. maintainability.
- Think about the user and the business goal. Technology serves a purpose.
Next time you get an ML system design problem, take a deep breath and start with, “That’s a great question. Before we dive into models, let’s first define what we’re trying to achieve…”
I hope you liked this article on how to solve ML system design interview problems. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





