Back to Notes

SQL Subqueries & CTEs

SQL Subqueries & CTEs

Subqueries — Inline Queries

Scalar Subquery — Returns Single Value

-- Employees earning above average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Salary difference from company average
SELECT name, salary,
  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

Multi-Row Subquery — Returns List (use with IN / NOT IN / ANY / ALL)

-- Employees in Engineering or Product departments
SELECT name FROM employees
WHERE department_id IN (
  SELECT id FROM departments WHERE name IN ('Engineering', 'Product')
);

-- Salary greater than ANY senior employee (i.e., greater than min senior salary)
SELECT name, salary FROM employees
WHERE salary > ANY (SELECT salary FROM employees WHERE level = 'Senior');

-- Salary greater than ALL senior employees (i.e., greater than max senior salary)
SELECT name, salary FROM employees
WHERE salary > ALL (SELECT salary FROM employees WHERE level = 'Senior');

Correlated Subquery — References Outer Query

Runs once per row of the outer query. Slower than JOIN for large tables.

-- Employees earning above their department's average
SELECT e.name, e.salary, e.department
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE department = e.department  -- ← references outer query's row
);

-- Latest order per user (correlated)
SELECT o.*
FROM orders o
WHERE o.created_at = (
  SELECT MAX(created_at)
  FROM orders
  WHERE user_id = o.user_id  -- ← per-user max
);

EXISTS / NOT EXISTS

-- Users who have placed at least one order
SELECT u.name
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- Users with no orders (anti-join, handles NULLs safely)
SELECT u.name
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

EXISTS vs IN:

  • EXISTS short-circuits on first match — faster when subquery returns large result set
  • IN materializes entire subquery — efficient when subquery is small
  • NOT EXISTS is NULL-safe; NOT IN breaks if subquery contains NULLs

CTEs — Common Table Expressions

Named subqueries at the top of the query. Evaluated once, reusable.

-- Basic CTE: users who ordered > $1000 total
WITH high_value_users AS (
  SELECT user_id, SUM(amount) AS total_spent
  FROM orders
  GROUP BY user_id
  HAVING SUM(amount) > 1000
)
SELECT u.name, h.total_spent
FROM users u
JOIN high_value_users h ON u.id = h.user_id;

Multiple CTEs

WITH
active_users AS (
  SELECT user_id
  FROM sessions
  WHERE created_at >= NOW() - INTERVAL '30 days'
  GROUP BY user_id
  HAVING COUNT(*) >= 3
),
user_revenue AS (
  SELECT user_id, SUM(amount) AS revenue
  FROM orders
  GROUP BY user_id
)
SELECT u.name, r.revenue
FROM users u
JOIN active_users a  ON u.id = a.user_id
JOIN user_revenue r  ON u.id = r.user_id
ORDER BY r.revenue DESC;

CTE vs Subquery — When to Use Which

SubqueryCTE
ReadabilityNested, harder to readNamed, top-level, easier to follow
ReuseCan't reuseReuse by referencing same CTE name multiple times
DebuggingHard to test in isolationCan comment out main query and SELECT from CTE
PerformanceUsually same (optimizer flattens both)Same in most DBs; some DBs materialize CTEs

Rule of thumb: CTE when logic is complex, reused, or benefits from a name. Subquery for simple one-off inline filters.


Recursive CTEs

Traverse hierarchical data (org charts, folder trees, graphs).

WITH RECURSIVE org_tree AS (
  -- Anchor: start node
  SELECT id, name, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id IS NULL  -- top of the org

  UNION ALL

  -- Recursive: join against previous iteration's results
  SELECT e.id, e.name, e.manager_id, ot.depth + 1
  FROM employees e
  JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY depth, name;

How Recursive CTE Executes

1. Run anchor → working_table = {CEO}
2. Join employees against working_table → new rows = {VP1, VP2}
   working_table = {VP1, VP2}   (only NEW rows each pass)
   result_table += {VP1, VP2}
3. Join against {VP1, VP2} → new rows = {Dir1, Dir2, Dir3}
   working_table = {Dir1, Dir2, Dir3}
   result_table += {Dir1, Dir2, Dir3}
4. Continue until no new rows

Key: working_table = previous iteration's output only (not all accumulated rows). This is why it terminates for trees.

Recursive CTE — Find Path Between Nodes

WITH RECURSIVE path AS (
  SELECT id, name, manager_id, ARRAY[id] AS visited
  FROM employees
  WHERE id = @start_id

  UNION ALL

  SELECT e.id, e.name, e.manager_id, p.visited || e.id
  FROM employees e
  JOIN path p ON e.manager_id = p.id
  WHERE NOT e.id = ANY(p.visited)  -- cycle prevention
)
SELECT * FROM path;

Sequence Generation

-- Generate numbers 1 to 100
WITH RECURSIVE nums AS (
  SELECT 1 AS n
  UNION ALL
  SELECT n + 1 FROM nums WHERE n < 100
)
SELECT * FROM nums;

-- Generate date range
WITH RECURSIVE dates AS (
  SELECT '2026-01-01'::DATE AS d
  UNION ALL
  SELECT d + 1 FROM dates WHERE d < '2026-01-31'
)
SELECT * FROM dates;

Subquery in FROM Clause (Derived Table)

-- Same as CTE but inline
SELECT dept, avg_sal
FROM (
  SELECT department AS dept, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY department
) dept_stats
WHERE avg_sal > 70000;

Interview Questions

Q: CTE vs subquery performance? → Usually identical — optimizer treats them the same. Exception: some DBs (older PostgreSQL) materialize CTEs (compute once, no re-optimization). PostgreSQL 12+ CTEs are inlined by default; use MATERIALIZED keyword to force old behavior.

Q: When is recursive CTE infinite loop risk? → Cyclic data (A→B→A). Guard with: cycle detection column (array of visited IDs), or MAXRECURSION limit (SQL Server default 100).


Related

  • [[JOINs]] — anti-join pattern alternatives
  • [[Window Functions]] — per-row results without collapsing rows
  • [[SQL for Interviews]] — CTEs in classic problems