SQL Subqueries Guide

SQL Subqueries are a powerful tool used to perform operations within a query by embedding one query within another. Data Analysts/Scientists use SQL subqueries for complex data retrieval and manipulation. So, if you want to learn about SQL subqueries, this article is for you. In this article, I’ll take you through a detailed practical guide to SQL subqueries for Data Science.

A subquery is a query nested inside another query. It can be used in various parts of a SQL statement, such as in the SELECT, FROM, WHERE, or HAVING clauses. The main types of SQL subqueries are:

Single-Row Subqueries
Multi-Row Subqueries
Correlated Subqueries
Nested Subqueries
Subqueries in the FROM Clause

Let’s understand all these practically one by one. Below are two example tables that I’ll use to show the practical implementation of all these SQL subqueries.

EmployeeID	EmployeeName	DepartmentID	Salary
1	John	1	70000
2	Sarah	2	80000
3	Mike	1	60000
4	Linda	3	90000

Table: Employees

DepartmentID	DepartmentName
1	HR
2	IT
3	Marketing

Table: Departments

Single-Row Subqueries

Single-row subqueries return only one row as a result. They are often used in the WHERE clause. Here’s an example to find the employees whose salary is above the average salary:

SELECT EmployeeName, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

EmployeeName	Salary
Sarah	80000
Linda	90000

Here, the subquery (SELECT AVG(Salary) FROM Employees) returns the average salary, which is then used to filter employees whose salary is above that average.

Multi-Row Subqueries

Multi-row subqueries return more than one row. They are often used with IN, ANY, or ALL operators. Here’s an example to find the employees who belong to departments with more than one employee:

SELECT EmployeeName
FROM Employees
WHERE DepartmentID IN (
    SELECT DepartmentID
    FROM Employees
    GROUP BY DepartmentID
    HAVING COUNT(*) > 1
);

EmployeeName
John
Mike

Here, the subquery identifies departments with more than one employee, and the main query retrieves the names of employees in those departments.

Correlated Subqueries

Correlated subqueries reference columns from the outer query. They are evaluated once for each row processed by the outer query. Here’s an example to find employees whose salary is higher than the average salary of their department:

SELECT EmployeeName, Salary
FROM Employees e1
WHERE Salary > (
    SELECT AVG(Salary)
    FROM Employees e2
    WHERE e1.DepartmentID = e2.DepartmentID
);

EmployeeName	Salary
Sarah	80000
Linda	90000

Here, the subquery calculates the average salary within the department of each employee, and the main query selects those whose salary exceeds that average.

Nested Subqueries

Nested subqueries involve multiple layers of subqueries within one another. Here’s an example to find the department with the highest average salary:

SELECT DepartmentName
FROM Departments
WHERE DepartmentID = (
    SELECT DepartmentID
    FROM Employees
    GROUP BY DepartmentID
    ORDER BY AVG(Salary) DESC
    LIMIT 1
);

Output: Marketing

Here, the innermost subquery calculates the average salary per department, and the outer query retrieves the department name with the highest average salary.

Subqueries in the FROM Clause

Subqueries can also be used in the FROM clause to create a temporary table or result set. Here’s an example to find the average salary of each department and include it in the main query:

SELECT d.DepartmentName, sub.AvgSalary
FROM Departments d
JOIN (
    SELECT DepartmentID, AVG(Salary) AS AvgSalary
    FROM Employees
    GROUP BY DepartmentID
) sub
ON d.DepartmentID = sub.DepartmentID;

DepartmentName	AvgSalary
HR	65000
IT	80000
Marketing	90000

Here, the subquery in the FROM clause calculates the average salary per department, and the main query joins this result with the Departments table.

Summary

So, here are the SQL subqueries you should know for Data Science:

Single-Row Subqueries: Return a single row and are often used in WHERE clauses.
Multi-Row Subqueries: Return multiple rows, typically used with IN, ANY, or ALL.
Correlated Subqueries: Reference columns from the outer query and are evaluated row by row.
Nested Subqueries: Involve multiple layers of subqueries within each other.
Subqueries in the FROM Clause: Used to create temporary tables or result sets within a query.

I hope you liked this article on a guide to SQL subqueries. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.