Data Scientists use SQL to solve complex data manipulation, aggregation, and transformation problems. In Data Science interviews, Interviewers often test your SQL knowledge to work with structured and semi-structured data, create readable queries, and handle real-world challenges like hierarchy, JSON, and summarization. So, if you are preparing for Data Science interviews and looking for practice problems based on SQL, this article is for you. In this article, I’ll take you through a guide to essential SQL concepts for Data Science interviews, with example questions.
SQL Concepts for Data Science Interviews
Below are must-know SQL concepts for Data Science interviews, each explained in detail with an example question.
Window Functions
Window functions perform calculations across a set of rows related to the current row without collapsing the result into a single output. They are useful for running totals, rankings, and moving averages.
Example Question: You have a sales table with columns: sale_id, sale_date, store_id, and amount. Write a query to calculate the running total of sales for each store ordered by date.
Here’s how to solve this SQL problem:
SELECT
store_id,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY store_id ORDER BY sale_date) AS running_total
FROM
sales
ORDER BY
store_id, sale_date;The output will show each store’s store_id, sale_date, amount, and a cumulative running_total of sales up to each date, ordered by store_id and sale_date.
Common Table Expressions (CTEs)
CTEs allow you to create temporary result sets that can be referred to within the same query. They improve query readability and can be used for recursive queries.
Example Question: You have an employees table with columns: employee_id, manager_id, and salary. Write a query to calculate the total salary expense for each manager, including their direct reports.
Here’s how to solve this SQL problem:
WITH ManagerSalaries AS (
SELECT
manager_id,
SUM(salary) AS total_salary
FROM
employees
GROUP BY
manager_id
)
SELECT
m.manager_id,
e.employee_id,
e.salary,
m.total_salary
FROM
ManagerSalaries m
JOIN
employees e
ON
m.manager_id = e.manager_id;The output will show each manager’s manager_id, their employees’ employee_id and salary, and the total salary managed by that manager.
Recursive Queries
Recursive queries use CTEs to process hierarchical or sequential data. They are particularly useful for analyzing tree-like structures, such as organizational charts or file systems.
Example Question: You have a departments table with columns: department_id and parent_department_id. Write a query to find all sub-departments under a given department.
Here’s how to solve this SQL problem:
WITH RecursiveDept AS (
SELECT
department_id,
parent_department_id
FROM
departments
WHERE
department_id = 1 -- Starting point
UNION ALL
SELECT
d.department_id,
d.parent_department_id
FROM
departments d
INNER JOIN
RecursiveDept r
ON
d.parent_department_id = r.department_id
)
SELECT
*
FROM
RecursiveDept;The output will show all departments and their parent-child relationships starting from the specified department (department_id = 1), including all its sub-departments recursively.
Handling JSON Data
Modern databases like PostgreSQL and MySQL support JSON data types, allowing you to store and query semi-structured data. It is critical for handling APIs or logs.
Example Question: You have a user_data table with a user_info column containing JSON data. Write a query to extract the email field and count users by their country.
Here’s how to solve this SQL problem:
SELECT
user_info->>'country' AS country,
COUNT(*) AS user_count
FROM
user_data
GROUP BY
user_info->>'country';The output will display each country extracted from the JSON column user_info and the corresponding count of users from that country.
Pivoting Data with CASE Statements
Pivoting transforms rows into columns, which helps create reports or summarize data. It can be achieved using CASE statements or database-specific pivot functions.
Example Question: You have a sales table with columns: store_id, sale_date, and amount. Write a query to show the total sales for each store, with separate columns for each month.
Here’s how to solve this SQL problem:
SELECT
store_id,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 1 THEN amount ELSE 0 END) AS january_sales,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 2 THEN amount ELSE 0 END) AS february_sales,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 3 THEN amount ELSE 0 END) AS march_sales
FROM
sales
GROUP BY
store_id;The output will show each store’s store_id along with total sales for January, February, and March as separate columns (january_sales, february_sales, march_sales).
Summary
So, in Data Science interviews, Interviewers often test your SQL knowledge to work with structured and semi-structured data, create readable queries, and handle real-world challenges like hierarchy, JSON, and summarization. I hope you liked this article on SQL concepts for Data Science interviews. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





