20 Problems to Crack Data Science Interviews

If you are preparing for data science interviews, you need to practice interview problems based on SQL, pandas, NumPy, statistics, and many more foundational concepts in data science. So, if you are looking for Data Science interview problems, this article is for you. In this article, I’ll take you through 20 problems to help you crack Data Science interviews.

20 Problems to Crack Data Science Interviews

Below are 20 problems you should try to crack Data Science interviews, and how to solve them using Python/SQL.

Problems based on SQL

Problem 1: You have a sales table with columns: sale_id, sale_date, store_id, and amount. Write a query to calculate the running total of sales for each store ordered by date.

Here’s how to solve this SQL problem:

SELECT 
    store_id,
    sale_date,
    amount,
    SUM(amount) OVER (PARTITION BY store_id ORDER BY sale_date) AS running_total
FROM 
    sales
ORDER BY 
    store_id, sale_date;

The output will show each store’s store_id, sale_date, amount, and a cumulative running_total of sales up to each date, ordered by store_id and sale_date.

Problem 2: You have an employees table with columns: employee_id, manager_id, and salary. Write a query to calculate the total salary expense for each manager, including their direct reports.

Here’s how to solve this SQL problem:

WITH ManagerSalaries AS (
    SELECT 
        manager_id,
        SUM(salary) AS total_salary
    FROM 
        employees
    GROUP BY 
        manager_id
)
SELECT 
    m.manager_id,
    e.employee_id,
    e.salary,
    m.total_salary
FROM 
    ManagerSalaries m
JOIN 
    employees e
ON 
    m.manager_id = e.manager_id;

The output will show each manager’s manager_id, their employees’ employee_id and salary, and the total salary managed by that manager.

Problem 3: You have a departments table with columns: department_id and parent_department_id. Write a query to find all sub-departments under a given department.

Here’s how to solve this SQL problem:

WITH RecursiveDept AS (
    SELECT 
        department_id,
        parent_department_id
    FROM 
        departments
    WHERE 
        department_id = 1 -- Starting point

    UNION ALL

    SELECT 
        d.department_id,
        d.parent_department_id
    FROM 
        departments d
    INNER JOIN 
        RecursiveDept r
    ON 
        d.parent_department_id = r.department_id
)
SELECT 
    * 
FROM 
    RecursiveDept;

The output will show all departments and their parent-child relationships starting from the specified department (department_id = 1), including all its sub-departments recursively.

Problem 4: You have a user_data table with a user_info column containing JSON data. Write a query to extract the email field and count users by their country.

Here’s how to solve this SQL problem:

SELECT 
    user_info->>'country' AS country,
    COUNT(*) AS user_count
FROM 
    user_data
GROUP BY 
    user_info->>'country';

The output will display each country extracted from the JSON column user_info and the corresponding count of users from that country.

Problem 5: You have a sales table with columns: store_id, sale_date, and amount. Write a query to show the total sales for each store, with separate columns for each month.

Here’s how to solve this SQL problem:

SELECT 
    store_id,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 1 THEN amount ELSE 0 END) AS january_sales,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 2 THEN amount ELSE 0 END) AS february_sales,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 3 THEN amount ELSE 0 END) AS march_sales
FROM 
    sales
GROUP BY 
    store_id;

The output will show each store’s store_id along with total sales for January, February, and March as separate columns (january_sales, february_sales, march_sales).

Problems based on Statistics

Problem 6: You are analyzing the performance of three different advertising strategies (A, B, C) based on the number of product purchases. Use ANOVA to determine if the strategies lead to significantly different results.