How to Solve the Top N Per Category Problem in SQL

For many of us in AI and Machine Learning, Python feels like home, but SQL feels like a foreign country we only visit for quick data-pulling trips. We master the basics, like SELECT, WHERE, and JOIN, but when faced with a slightly more complex analytical query, we freeze. Today, we’re going to learn how to solve a classic SQL challenge based on the Top N Per Category Problem.

The Challenge: Top N Per Category

Let’s set the scenario. We have two simple tables: Employees and Departments.

Table: Employees
EmployeeID | Name | Salary | DepartmentID
-----------|---------|--------|--------------
1 | Alice | 90000 | 1
2 | Bob | 85000 | 1
3 | Charlie | 85000 | 1
4 | David | 80000 | 1
5 | Eve | 95000 | 2
6 | Frank | 92000 | 2
7 | Grace | 88000 | 2

Table: Departments
DepartmentID | DepartmentName
-------------|----------------
1 | Engineering
2 | Marketing

The goal is to write a query to find the top 3 salaries in each department.

The Instinctive (but Flawed) Approach

The first instinct for many is to reach for GROUP BY. Something like this:

SELECT
    d.DepartmentName,
    MAX(e.Salary)
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID
GROUP BY d.DepartmentName;

This query gives you the single highest salary in each department. But what about the second? Or the third? GROUP BY aggregates all the rows in a group into a single summary row. It can’t show you the individual employees who make up the top 3. It is where many people get stuck and start thinking about self-joins or unions.

Start using Window Functions

Let’s talk about one of the most powerful concepts in modern SQL, window functions.

Let’s say you’re calculating something, but you don’t want to collapse your rows like GROUP BY does. You want to perform a calculation across a set of related rows while keeping the individual rows intact. This is exactly what window functions do.

For our problem, we want to look at each department as its own little window, rank the employees by salary within that window, and then move on to the next one.

The function we’ll use is DENSE_RANK(). For finding the top N salaries, DENSE_RANK() is usually the best choice because if two employees are tied for the second-highest salary, we want to include both of them as rank #2.

The Step-by-Step SQL Solution for the Top N Per Category Problem

The first step is to Rank Employees within Each Department (The CTE). First, we’ll write a query to select all the employee information and add a new column for the salary rank within their department:

WITH RankedSalaries AS (
    SELECT
        e.Name,
        e.Salary,
        d.DepartmentName,
        DENSE_RANK() OVER (PARTITION BY d.DepartmentName ORDER BY e.Salary DESC) as salary_rank
    FROM Employees e
    JOIN Departments d ON e.DepartmentID = d.DepartmentID
)

This CTE, RankedSalaries, now acts like a temporary table that looks like this:

Name    | Salary | DepartmentName | salary_rank
--------|--------|----------------|-------------
Alice | 90000 | Engineering | 1
Bob | 85000 | Engineering | 2
Charlie | 85000 | Engineering | 2
David | 80000 | Engineering | 3
Eve | 95000 | Marketing | 1
Frank | 92000 | Marketing | 2
Grace | 88000 | Marketing | 3

Now that we have our ranked list, the next step is incredibly simple. We just query our CTE and filter for the ranks we want:

SELECT
    Name,
    Salary,
    DepartmentName,
    salary_rank
FROM RankedSalaries
WHERE salary_rank <= 3;
Name    | Salary | DepartmentName
--------|--------|----------------
Alice | 90000 | Engineering
Bob | 85000 | Engineering
Charlie | 85000 | Engineering
David | 80000 | Engineering
Eve | 95000 | Marketing
Frank | 92000 | Marketing
Grace | 88000 | Marketing

And there you have it! The final result is a clean, readable list of the employees with the top 3 salaries in each department.

Final Words

This technique will help you quickly identify leaders, laggards, and outliers within any category, be it products, regions, or users, which is crucial for forming hypotheses. You can find many more SQL interview problems for Data Science here. I hope you liked this article on how to solve a classic SQL challenge based on the Top N Per Category Problem. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.

Aman Kharwal
Aman Kharwal

AI/ML Engineer | Published Author. My aim is to decode data science for the real world in the most simple words.

Articles: 2090

Leave a Reply

Discover more from AmanXai by Aman Kharwal

Subscribe now to keep reading and get access to the full archive.

Continue reading