How to Work with PostgreSQL Subquery: Easy Steps Explained

•

January 23, 2026

Summarize this article with:

✨ AI Generated Summary

You’ve seen it before. A simple query crawls because the same inner SELECT runs thousands of times. The logic is correct, but a correlated subquery forces PostgreSQL to repeat work for every row.

Subqueries let you nest a SELECT inside another statement so PostgreSQL can compute a value once or derive a focused result set exactly where it is needed. Uncorrelated subqueries run once. Correlated ones run per row and can become a performance trap. They are powerful, but not always the best choice.

Understanding how PostgreSQL executes subqueries helps you know when to use them and when to reach for joins, CTEs, or window functions instead.

TL;DR: PostgreSQL Subqueries at a Glance

PostgreSQL subqueries allow you to embed a SELECT inside another statement to calculate values, filter rows, or check existence without creating temporary tables
Uncorrelated subqueries are evaluated once and folded into the query plan, making them safe and predictable even on large tables
Correlated subqueries depend on outer query columns, execute once per row, and often introduce nested loops that can slow queries by orders of magnitude
Use subqueries for dynamic filters, single-value lookups, and EXISTS checks, but prefer joins, CTEs, or window functions when you need to aggregate or scan large datasets efficiently

What Is a PostgreSQL Subquery and When Should You Use One?

A subquery is a SELECT tucked inside another SQL statement. PostgreSQL evaluates the inner query first (unless it depends on outer rows) and passes its result to the outer query.

The inner logic lives in parentheses, so you can slice complex problems into readable chunks. Need to filter employees above their department's average pay? A subquery handles that without sprawling joins.

You'll reach for subqueries when you need dynamic filters, per-row existence checks, or single-value lookups that would be awkward with plain joins. Uncorrelated subqueries run once and integrate smoothly into the query plan. Correlated ones execute for every outer row and can stall on large tables.

The canonical pattern follows this structure: SELECT select_list FROM table1 WHERE column operator (SELECT column FROM table2 WHERE condition);.

This foundation supports three distinct result shapes, each serving different use cases.

1. Scalar Subqueries

A scalar subquery must yield one row and one column. PostgreSQL treats that single value like a literal, so you can drop it into WHERE, SELECT, or HAVING clauses.

If the subquery returns zero rows, the result becomes NULL. More than one row throws a runtime error. That makes aggregates (AVG, MAX, MIN) your safe bet when you can't guarantee a single match.

SELECT order_id, total_amount
FROM orders
WHERE total_amount >
      (SELECT AVG(total_amount) FROM orders);

PostgreSQL calculates the average once, then compares every order to that figure. Because the inner query is uncorrelated, the planner folds it into the plan, and performance stays predictable even on millions of rows.

2. Row Subqueries

Row subqueries return one row with multiple columns and are compared as composite values. You wrap both sides in row constructors so PostgreSQL can perform element-wise comparison.

SELECT employee_id, first_name, last_name
FROM employees
WHERE (department_id, manager_id) =
      (SELECT department_id, manager_id
       FROM departments
       WHERE location_id = 1800);

If either side contains a NULL, the comparison resolves to NULL (and doesn't pass the filter). More than one row from the subquery triggers an error. Zero rows make the whole comparison NULL.

This technique works well for matching compound keys without verbose AND chains.

3. Table Subqueries

Table subqueries deliver full result sets (multiple rows and columns) and behave like temporary tables. You'll see them in two main spots: in the FROM clause as a derived table that needs an alias, or inside set operators (IN, EXISTS, ANY, ALL) in a WHERE filter.

SELECT first_name, last_name
FROM employees
WHERE department_id IN
      (SELECT department_id
       FROM departments
       WHERE location_id = 1800);

The subquery sits in the IN list, so PostgreSQL hashes the result once, then probes it for every employee. When you move the same logic into a derived table in FROM, the optimizer may rewrite it into a join, often gaining speed through predicate push-down and early aggregation.

How Do You Write a Basic Subquery in PostgreSQL?

Breaking complex questions into inner and outer queries keeps the logic clear and errors obvious. The key is drafting the inner query, embedding it, then validating.

1. Start With the Inner Query

Always run the subquery on its own to confirm it returns the right "shape." A scalar subquery must produce exactly one value, a row subquery one row, and a table subquery any number of rows.

If the result can be empty, note how that will propagate: scalar subqueries return NULL, which changes comparison logic.

-- Stand-alone test: average salary (scalar)
SELECT AVG(salary) 
FROM employees;

Running in isolation surfaces hidden NULLs. A NOT IN comparison fails silently when the subquery includes NULL, so spotting them early lets you switch to NOT EXISTS instead. When the subquery returns many rows, add LIMIT 10 to preview the data without flooding your console.

2. Place the Subquery in the Outer Query

Once the inner query behaves correctly, wrap it in parentheses and drop it where the outer query needs it. Most often this is in a WHERE clause, but also in FROM (as a derived table) or in the SELECT list.

PostgreSQL's parser demands those parentheses; forgetting them raises an unmistakable syntax error.

SELECT  c.customer_id,
        c.first_name,
        c.last_name
FROM    customers AS c
WHERE   c.customer_id IN (
  SELECT DISTINCT customer_id
  FROM   orders
  WHERE  order_date >= CURRENT_DATE - INTERVAL '30 days'
);

The subquery executes first, returning recent customer IDs. The outer query then filters the customers table. If you move a subquery to the FROM clause, always provide an alias: FROM (SELECT ...) AS recent_orders.

Operator compatibility matters. Scalar subqueries pair with =, >, or <. Table subqueries belong with IN, EXISTS, ANY, or ALL.

3. Test and Validate Results

With both layers in place, sanity-check the output. Eyeball a few rows to ensure the filter behaves as intended, then compare row counts against expectations. Next, run EXPLAIN ANALYZE to see whether PostgreSQL executes the subquery once (good) or per outer row (a correlated performance red flag).

EXPLAIN ANALYZE
SELECT  c.customer_id
FROM    customers c
WHERE   NOT EXISTS (            -- NULL-safe negation
  SELECT 1
  FROM   orders o
  WHERE  o.customer_id = c.customer_id
);

If the plan shows nested loops on large tables, consider rewriting the logic as a JOIN or adding indexes to the correlated columns. When PostgreSQL complains that a scalar subquery "returns more than one row," either add an aggregate like MAX() or LIMIT 1 to enforce a single value.

What Are the Different Types of Subquery Placements?

You can place a subquery almost anywhere in a PostgreSQL statement, but where you put it changes both readability and runtime behavior. The planner treats a subquery in the WHERE clause very differently from the same logic embedded in FROM or SELECT.

1. Subqueries in the WHERE Clause

When you drop a subquery into WHERE, you're asking PostgreSQL to filter the outer rows based on a dynamic condition. Uncorrelated versions execute once and then behave like a constant list or value, while correlated versions run for every outer row and can turn into an N × M loop on large tables.

SELECT p.product_id, p.price
FROM products p
WHERE p.price > (
  SELECT AVG(price)
  FROM products
  WHERE category_id = p.category_id
);

The inner query recalculates the average per category because it references p.category_id, so the planner falls back to a nested-loop strategy. On a table with 10k rows, that pattern can be 10–100× slower than a join or window function.

For uncorrelated filters (like a static list of department IDs), PostgreSQL often rewrites the subquery into a hash join, so the cost difference versus a normal JOIN is minimal. Performance improves when you prefer EXISTS over IN for subqueries that may return hundreds or thousands of rows, since EXISTS short-circuits on the first match. Creating indexes on columns that appear in the correlation predicate also reduces the per-row lookup cost.

2. Subqueries in the FROM Clause (Derived Tables)

Placing the subquery in FROM turns its result set into a temporary table you can join like any other relation. The syntax always needs an alias:

SELECT c.category_name,
       s.total_sales,
       s.product_count
FROM categories c
JOIN (
  SELECT category_id,
         SUM(quantity * unit_price) AS total_sales,
         COUNT(*) AS product_count
  FROM order_items
  GROUP BY category_id
) AS s ON s.category_id = c.category_id;

Here the aggregation happens before the join, so only a handful of grouped rows flow into the rest of the plan. This pattern can even run 20% faster than equivalent joins because the optimizer pushes predicates into the subquery and materializes a much smaller dataset.

Use a derived table when you need to pre-aggregate or transform data once and reference it immediately, or when you want to avoid cluttering the main query with repeated calculations that don't justify a full CTE.

3. Subqueries in the SELECT Clause

Embedding a subquery in the SELECT list adds a computed scalar column to every output row. PostgreSQL must execute that inner query once per outer row unless it can prove it's uncorrelated.

SELECT e.employee_name,
       (
         SELECT COUNT(*)
         FROM orders o
         WHERE o.employee_id = e.id
       ) AS order_count
FROM employees e;

Each call to the subquery counts orders for a single employee. On a few hundred rows, the latency is negligible; on millions, the N+1 pattern becomes brutal. Correlated SELECT subqueries can take 10–100× longer than rewriting the same logic with a LEFT JOIN and GROUP BY or a window function like COUNT(*) OVER (PARTITION BY employee_id).

When you need a row-level aggregate, first consider a window function because it computes once per partition rather than once per row. If you must keep the subquery, make sure the inner tables are indexed on the correlation key so each lookup is an index seek, not a full scan.

When Should You Use EXISTS vs IN with Subqueries?

Choosing between EXISTS and IN dictates how PostgreSQL builds the execution plan and, ultimately, how fast your query runs. Here's a quick decision matrix:

Scenario	Use EXISTS	Use IN
Large or variable subquery result	✓	—
Small, predictable list (< ~12 rows)	—	✓
Correlated subquery per outer row	✓	—
Subquery may contain NULLs	✓	—
Readability for simple lists	—	✓

What Are Correlated Subqueries and How Do They Work?

A correlated subquery references columns from your outer query, forcing PostgreSQL to rerun the inner query for every single row. This gives you row-specific comparisons, but the performance cost can spike from milliseconds to minutes on large tables.

Say you want every employee who earns more than their department's average salary. The comparison depends on each employee's department_id, so the inner query must recalculate for every outer row:

SELECT e.employee_id,
       e.first_name,
       e.salary
FROM   employees e
WHERE  e.salary >
       (SELECT AVG(salary)
        FROM   employees
        WHERE  department_id = e.department_id);

PostgreSQL can't treat this as a one-time lookup because the subquery references e.department_id. Instead, it builds a nested-loop plan: 10,000 employees means 10,000 inner scans, creating O(n²) work. This pattern runs 10–100× slower than equivalent joins or window functions on medium-sized datasets.

Check EXPLAIN ANALYZE for the telltale sign: a "Nested Loop" node where the inner side queries the same table repeatedly.

Use correlated subqueries for existence checks (WHERE EXISTS), per-row thresholds, or hierarchical lookups. These subqueries demand indexes on the columns linking inner and outer queries. When latency matters, consider these rewrites:

Use window functions like AVG(salary) OVER (PARTITION BY department_id) to calculate once per group
Convert to joins on pre-aggregated derived tables
Use LATERAL joins for per-row, set-returning behavior that optimizes better

Pick the right approach and keep subquery convenience without the performance hit.

How Can You Move PostgreSQL Data After Querying It?

Once you've perfected the subquery logic that surfaces the right slice of data, the next hurdle is getting that data where it actually delivers value, typically a warehouse or analytics platform. Snowflake, BigQuery, Databricks, and Redshift remain popular destinations, but operational databases and SaaS tools often sit on the itinerary as well.

Exporting CSV files or running ad-hoc Python scripts works for a proof-of-concept, yet those approaches break down fast. Cron jobs miss schedules, file formats drift, and you end up nursing fragile bash scripts rather than shipping features. Change Data Capture (CDC) replication is a cleaner option: it streams only inserts, updates, and deletes so your production tables avoid the full-table scans that drag performance.

That's exactly where Airbyte fits into your workflow. The PostgreSQL connector offers both full-refresh loads for the initial backfill and CDC replication for continuous updates. You hook up the source, choose one of 600+ destinations, and let the platform handle schema drift, retries, and secret rotation. Because pricing is capacity-based, your costs stay predictable even when data volume surges five-fold.

Ready to move PostgreSQL data without babysitting brittle scripts? Try Airbyte to spin up a sync, point it at your warehouse, and focus on analysis instead of pipeline maintenance.

For teams managing multiple PostgreSQL sources or high-volume CDC workloads, talk to sales to learn how capacity-based pricing keeps replication costs predictable as your data grows.

Frequently Asked Questions

Are PostgreSQL subqueries slower than JOINs?

Not always. Uncorrelated subqueries are often optimized into joins or hash lookups and can perform just as well. Performance problems usually come from correlated subqueries that execute once per outer row. When you see nested loops in EXPLAIN ANALYZE, a JOIN or window function is often the faster alternative.

When should I avoid correlated subqueries?

Avoid correlated subqueries on large tables or high-traffic queries, especially when the inner query scans many rows. They are best suited for existence checks with EXISTS or small, well-indexed lookups. If you need per-group aggregates or counts, window functions or pre-aggregated joins scale better.

Why does NOT IN sometimes return no rows?

NOT IN behaves unexpectedly when the subquery returns NULL. If even one NULL appears in the result set, the comparison evaluates to unknown and filters out all rows. To avoid this, use NOT EXISTS, which is NULL-safe and usually performs better on larger datasets.

How can I tell if PostgreSQL runs a subquery once or per row?

Use EXPLAIN ANALYZE. Uncorrelated subqueries appear as init plans or hashed subplans that execute once. Correlated subqueries typically show nested loop execution where the inner query runs repeatedly. If you see high loop counts on the inner node, consider rewriting the query.

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program

The data movement infrastructure for the modern data teams.

Try a 30-day free trial

About the Author

Jim Kutz brings over 20 years of experience in data analytics to his work, helping organizations transform raw data into actionable business insights. His expertise spans predictive modeling, data engineering and data visualization, with a focus on making analytics accessible and impactful for stakeholders at all levels.