SQL Subqueries are a powerful tool used to perform operations within a query by embedding one query within another. Data Analysts/Scientists use SQL subqueries for complex data retrieval and manipulation. So, if you want to learn about SQL subqueries, this article is for you. In this article, I’ll take you through a detailed practical guide to SQL subqueries for Data Science.
SQL Subqueries Guide
A subquery is a query nested inside another query. It can be used in various parts of a SQL statement, such as in the SELECT, FROM, WHERE, or HAVING clauses. The main types of SQL subqueries are:
- Single-Row Subqueries
- Multi-Row Subqueries
- Correlated Subqueries
- Nested Subqueries
- Subqueries in the FROM Clause
Let’s understand all these practically one by one. Below are two example tables that I’ll use to show the practical implementation of all these SQL subqueries.
| EmployeeID | EmployeeName | DepartmentID | Salary |
|---|---|---|---|
| 1 | John | 1 | 70000 |
| 2 | Sarah | 2 | 80000 |
| 3 | Mike | 1 | 60000 |
| 4 | Linda | 3 | 90000 |
| DepartmentID | DepartmentName |
|---|---|
| 1 | HR |
| 2 | IT |
| 3 | Marketing |
Single-Row Subqueries
Single-row subqueries return only one row as a result. They are often used in the WHERE clause. Here’s an example to find the employees whose salary is above the average salary:
SELECT EmployeeName, Salary FROM Employees WHERE Salary > (SELECT AVG(Salary) FROM Employees);
| EmployeeName | Salary |
|---|---|
| Sarah | 80000 |
| Linda | 90000 |
Here, the subquery (SELECT AVG(Salary) FROM Employees) returns the average salary, which is then used to filter employees whose salary is above that average.
Multi-Row Subqueries
Multi-row subqueries return more than one row. They are often used with IN, ANY, or ALL operators. Here’s an example to find the employees who belong to departments with more than one employee:
SELECT EmployeeName
FROM Employees
WHERE DepartmentID IN (
SELECT DepartmentID
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(*) > 1
);| EmployeeName |
|---|
| John |
| Mike |
Here, the subquery identifies departments with more than one employee, and the main query retrieves the names of employees in those departments.
Correlated Subqueries
Correlated subqueries reference columns from the outer query. They are evaluated once for each row processed by the outer query. Here’s an example to find employees whose salary is higher than the average salary of their department:
SELECT EmployeeName, Salary
FROM Employees e1
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees e2
WHERE e1.DepartmentID = e2.DepartmentID
);| EmployeeName | Salary |
|---|---|
| Sarah | 80000 |
| Linda | 90000 |
Here, the subquery calculates the average salary within the department of each employee, and the main query selects those whose salary exceeds that average.
Nested Subqueries
Nested subqueries involve multiple layers of subqueries within one another. Here’s an example to find the department with the highest average salary:
SELECT DepartmentName
FROM Departments
WHERE DepartmentID = (
SELECT DepartmentID
FROM Employees
GROUP BY DepartmentID
ORDER BY AVG(Salary) DESC
LIMIT 1
);Output: Marketing
Here, the innermost subquery calculates the average salary per department, and the outer query retrieves the department name with the highest average salary.
Subqueries in the FROM Clause
Subqueries can also be used in the FROM clause to create a temporary table or result set. Here’s an example to find the average salary of each department and include it in the main query:
SELECT d.DepartmentName, sub.AvgSalary
FROM Departments d
JOIN (
SELECT DepartmentID, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY DepartmentID
) sub
ON d.DepartmentID = sub.DepartmentID;| DepartmentName | AvgSalary |
|---|---|
| HR | 65000 |
| IT | 80000 |
| Marketing | 90000 |
Here, the subquery in the FROM clause calculates the average salary per department, and the main query joins this result with the Departments table.
Summary
So, here are the SQL subqueries you should know for Data Science:
- Single-Row Subqueries: Return a single row and are often used in WHERE clauses.
- Multi-Row Subqueries: Return multiple rows, typically used with IN, ANY, or ALL.
- Correlated Subqueries: Reference columns from the outer query and are evaluated row by row.
- Nested Subqueries: Involve multiple layers of subqueries within each other.
- Subqueries in the FROM Clause: Used to create temporary tables or result sets within a query.
I hope you liked this article on a guide to SQL subqueries. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.





