SQL Subqueries & CTEs
SQL Subqueries & CTEs
Subqueries — Inline Queries
Scalar Subquery — Returns Single Value
-- Employees earning above average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Salary difference from company average
SELECT name, salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
Multi-Row Subquery — Returns List (use with IN / NOT IN / ANY / ALL)
-- Employees in Engineering or Product departments
SELECT name FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE name IN ('Engineering', 'Product')
);
-- Salary greater than ANY senior employee (i.e., greater than min senior salary)
SELECT name, salary FROM employees
WHERE salary > ANY (SELECT salary FROM employees WHERE level = 'Senior');
-- Salary greater than ALL senior employees (i.e., greater than max senior salary)
SELECT name, salary FROM employees
WHERE salary > ALL (SELECT salary FROM employees WHERE level = 'Senior');
Correlated Subquery — References Outer Query
Runs once per row of the outer query. Slower than JOIN for large tables.
-- Employees earning above their department's average
SELECT e.name, e.salary, e.department
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department -- ← references outer query's row
);
-- Latest order per user (correlated)
SELECT o.*
FROM orders o
WHERE o.created_at = (
SELECT MAX(created_at)
FROM orders
WHERE user_id = o.user_id -- ← per-user max
);
EXISTS / NOT EXISTS
-- Users who have placed at least one order
SELECT u.name
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- Users with no orders (anti-join, handles NULLs safely)
SELECT u.name
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
EXISTS vs IN:
EXISTSshort-circuits on first match — faster when subquery returns large result setINmaterializes entire subquery — efficient when subquery is smallNOT EXISTSis NULL-safe;NOT INbreaks if subquery contains NULLs
CTEs — Common Table Expressions
Named subqueries at the top of the query. Evaluated once, reusable.
-- Basic CTE: users who ordered > $1000 total
WITH high_value_users AS (
SELECT user_id, SUM(amount) AS total_spent
FROM orders
GROUP BY user_id
HAVING SUM(amount) > 1000
)
SELECT u.name, h.total_spent
FROM users u
JOIN high_value_users h ON u.id = h.user_id;
Multiple CTEs
WITH
active_users AS (
SELECT user_id
FROM sessions
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY user_id
HAVING COUNT(*) >= 3
),
user_revenue AS (
SELECT user_id, SUM(amount) AS revenue
FROM orders
GROUP BY user_id
)
SELECT u.name, r.revenue
FROM users u
JOIN active_users a ON u.id = a.user_id
JOIN user_revenue r ON u.id = r.user_id
ORDER BY r.revenue DESC;
CTE vs Subquery — When to Use Which
| Subquery | CTE | |
|---|---|---|
| Readability | Nested, harder to read | Named, top-level, easier to follow |
| Reuse | Can't reuse | Reuse by referencing same CTE name multiple times |
| Debugging | Hard to test in isolation | Can comment out main query and SELECT from CTE |
| Performance | Usually same (optimizer flattens both) | Same in most DBs; some DBs materialize CTEs |
Rule of thumb: CTE when logic is complex, reused, or benefits from a name. Subquery for simple one-off inline filters.
Recursive CTEs
Traverse hierarchical data (org charts, folder trees, graphs).
WITH RECURSIVE org_tree AS (
-- Anchor: start node
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL -- top of the org
UNION ALL
-- Recursive: join against previous iteration's results
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY depth, name;
How Recursive CTE Executes
1. Run anchor → working_table = {CEO}
2. Join employees against working_table → new rows = {VP1, VP2}
working_table = {VP1, VP2} (only NEW rows each pass)
result_table += {VP1, VP2}
3. Join against {VP1, VP2} → new rows = {Dir1, Dir2, Dir3}
working_table = {Dir1, Dir2, Dir3}
result_table += {Dir1, Dir2, Dir3}
4. Continue until no new rows
Key: working_table = previous iteration's output only (not all accumulated rows). This is why it terminates for trees.
Recursive CTE — Find Path Between Nodes
WITH RECURSIVE path AS (
SELECT id, name, manager_id, ARRAY[id] AS visited
FROM employees
WHERE id = @start_id
UNION ALL
SELECT e.id, e.name, e.manager_id, p.visited || e.id
FROM employees e
JOIN path p ON e.manager_id = p.id
WHERE NOT e.id = ANY(p.visited) -- cycle prevention
)
SELECT * FROM path;
Sequence Generation
-- Generate numbers 1 to 100
WITH RECURSIVE nums AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM nums WHERE n < 100
)
SELECT * FROM nums;
-- Generate date range
WITH RECURSIVE dates AS (
SELECT '2026-01-01'::DATE AS d
UNION ALL
SELECT d + 1 FROM dates WHERE d < '2026-01-31'
)
SELECT * FROM dates;
Subquery in FROM Clause (Derived Table)
-- Same as CTE but inline
SELECT dept, avg_sal
FROM (
SELECT department AS dept, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
) dept_stats
WHERE avg_sal > 70000;
Interview Questions
Q: CTE vs subquery performance?
→ Usually identical — optimizer treats them the same. Exception: some DBs (older PostgreSQL) materialize CTEs (compute once, no re-optimization). PostgreSQL 12+ CTEs are inlined by default; use MATERIALIZED keyword to force old behavior.
Q: When is recursive CTE infinite loop risk?
→ Cyclic data (A→B→A). Guard with: cycle detection column (array of visited IDs), or MAXRECURSION limit (SQL Server default 100).
Related
- [[JOINs]] — anti-join pattern alternatives
- [[Window Functions]] — per-row results without collapsing rows
- [[SQL for Interviews]] — CTEs in classic problems