Common Table Expressions (CTE): Syntax, Types, & Examples

Jim Kutz
September 9, 2025
20 min read

Summarize with ChatGPT

Summarize with Perplexity

Data teams often spend a significant amount of time dealing with complex SQL queries that can span hundreds of lines, making debugging difficult and hindering collaboration. This challenge becomes even more acute when dealing with hierarchical data structures or multi-step transformations that require intermediate results to be referenced multiple times within the same query.

Common Table Expressions (CTEs) solve this fundamental readability and maintainability problem by providing a way to break complex queries into logical, named components. A Common Table Expression is a temporary result set that exists only within the execution scope of a single SQL statement and can be referenced multiple times throughout that statement. Unlike subqueries, CTEs offer superior readability and reusability, while differing from temporary tables by being ephemeral constructs that disappear immediately after query execution.

You can reference CTEs within SELECT, INSERT, UPDATE, or DELETE statements, making them versatile tools for data manipulation and analysis. They essentially function as named subqueries that can simplify complex operations, improve code organization, and enable recursive operations for hierarchical data processing.

Why Should You Use Common Table Expressions in Your Data Workflows?

CTEs enable you to effectively manage complex queries in long SQL code while providing several strategic advantages for data engineering workflows. Modern data teams leverage CTEs not just for query organization, but as fundamental building blocks for maintainable data transformation pipelines.

You can label different blocks of code to understand the purpose of each part of long queries, creating self-documenting SQL that reduces cognitive load during code reviews and debugging sessions. This labeling becomes particularly valuable in collaborative environments where multiple team members need to understand and modify complex analytical queries.

They facilitate the logical organization of SQL queries, making it easier to identify and resolve bugs by isolating specific transformation steps. When a complex query produces unexpected results, you can examine each CTE individually to pinpoint exactly where the logic breaks down, dramatically reducing debugging time.

You can reference the same CTE across multiple parts of a query without rewriting the same sub-query repeatedly, following the DRY (Don't Repeat Yourself) principle that reduces maintenance overhead. This reusability becomes crucial in analytics queries that need to apply the same filtering or aggregation logic across different sections of the final output.

CTEs also serve as stepping stones for complex transformations that would be difficult to express in a single, monolithic query. By breaking transformations into sequential, logical steps, you create more maintainable code that other team members can understand and modify with confidence.

How Do You Structure Common Table Expression Syntax?

CTEs are defined using the keyword WITH, followed by the name of the CTE and the query that defines the result set. The basic syntax provides flexibility for both simple and complex scenarios:

WITH cte_name (column1, column2, ...) AS (
    SELECT ...
    FROM ...
    WHERE ...
)

Where:

  • cte_name represents the identifier you assign to the CTE, which should be descriptive and indicate the purpose of the intermediate result set
  • (column1, column2, …) defines the optional column aliases for the CTE output, useful when you need to rename columns from the underlying query
  • SELECT … FROM … WHERE … contains the query logic that produces the result set, which can include joins, aggregations, window functions, and other SQL operations

The column list specification is optional when column names from the underlying query are sufficient. However, explicit column naming improves readability and prevents ambiguity, especially in complex queries with multiple joins or calculated fields.

You can chain multiple CTEs together using comma separation, allowing each subsequent CTE to reference previous ones. This chaining capability enables step-by-step data transformations that build upon each other, creating clear logical progression through complex analytical operations.

The CTE definition must immediately precede the main query that uses it. You cannot define CTEs in isolation or reference them across separate SQL statements, as they exist only within the scope of the single statement where they are defined.

What Are the Core Types of Common Table Expressions?

There are different types of CTEs in data engineering, each serving specific use cases and transformation patterns in modern data workflows.

Non-recursive CTEs

Non-recursive CTEs don't refer to themselves repeatedly. They are typically used during transformations, filtering, or aggregation operations where you need to create intermediate result sets for clarity and reusability.

WITH cte_name AS (
   SELECT * 
   FROM table_name
   WHERE condition
)
SELECT * 
FROM cte_name
WHERE additional_condition;
  • WITH initiates the CTE definition
  • cte_name serves as the identifier for referencing the CTE in subsequent operations
  • (SELECT * FROM table_name WHERE condition) defines the CTE's content through standard SQL query operations
  • SELECT * FROM cte_name represents the main query that retrieves data from the CTE

Non-recursive CTEs excel at breaking down complex analytical queries into manageable steps. For example, you might use a non-recursive CTE to calculate customer lifetime value metrics in one step, then apply additional filters and joins in the main query without repeating the complex aggregation logic.

Recursive CTEs

A recursive CTE refers to itself and allows repeated execution of queries, building the result in stages. This capability proves essential for hierarchical data such as organizational charts, product categories, or any parent-child relationship structures.

WITH expression_name (column_list) AS (
    -- Anchor member
    initial_query  

    UNION ALL

    -- Recursive member that references expression_name
    recursive_query  
)
SELECT *
FROM expression_name;
  • expression_name serves as the identifier for the recursive CTE
  • (column_list) provides an optional explicit list of columns in the result set
  • Anchor member establishes the starting point of the recursion with base case data
  • UNION ALL combines the result sets of the anchor and recursive members
  • Recursive member contains the self-referencing logic that processes each iteration

Recursive CTEs operate through iterative execution where each iteration processes the results from the previous iteration until no new rows are produced. This approach enables traversal of hierarchical structures without knowing the depth in advance, making it invaluable for organizational reporting, bill-of-materials analysis, and network analysis scenarios.

Multiple CTEs

You can define multiple CTEs in a single query, each one able to build on the previous, creating sophisticated transformation pipelines within a single SQL statement:

WITH cte1 AS (
    SELECT ...
),
cte2 AS (
    SELECT ...
    FROM cte1
),
cte3 AS (
    SELECT ...
    FROM cte2
)
SELECT *
FROM cte3;

Here, cte1, cte2, cte3 represent separate CTEs chained together in a sequential transformation pipeline. Each CTE can reference any previously defined CTE in the same WITH clause, enabling complex data transformations that maintain readability through logical separation of concerns.

Multiple CTEs prove particularly valuable in data engineering scenarios where you need to perform sequential transformations like data cleansing, enrichment, and aggregation within a single analytical query. This pattern reduces the need for temporary tables while maintaining clear separation between transformation steps.

What Are Common Use-Case Examples for Common Table Expressions?

CTEs address specific analytical and operational challenges that frequently arise in data processing workflows. Understanding these use cases helps you identify opportunities to improve query maintainability and performance.

Recursive CTEs help query hierarchical datasets like employee organizational charts where you need to traverse manager-subordinate relationships without knowing the organizational depth. For instance, you can find all employees reporting to a specific executive across multiple levels of hierarchy using a recursive CTE that starts with the target executive and iteratively finds their direct and indirect reports.

CTEs serve as an alternative to creating temporary views in a database, providing similar functionality without the overhead of managing persistent database objects. This approach proves particularly valuable in analytical workflows where you need intermediate result sets but don't want to clutter the database schema with temporary objects that require cleanup.

A CTE lets you perform the same calculations multiple times within a single query without repetition, following DRY principles that reduce maintenance overhead. For example, in financial reporting scenarios, you might calculate period-over-period growth rates in a CTE and then reference those calculations in multiple sections of your final output without recalculating the complex growth logic.

CTEs enable complex aggregation patterns where you need to aggregate data at multiple levels within the same query. Revenue reporting often requires both detailed line-item data and various aggregation levels (by product, region, time period), which CTEs can handle elegantly by providing different aggregation views that the main query can combine.

Data quality checks benefit from CTEs when you need to identify and handle data anomalies before performing main analytical operations. You can use CTEs to identify outliers, missing values, or inconsistent data formats, then reference these quality metrics in your main analytical query to ensure robust results.

How Do Common Table Expressions Compare to Alternative SQL Constructs?

Feature

Common Table Expression

Subquery

View

Definition

Temporary result set used within a larger query

Query nested inside another query

Virtual table representing a stored query

Naming

Mandatory

Optional

Mandatory

Syntax

WITH statement

WHERE, FROM, HAVING clauses

CREATE VIEW statement

Re-usability

Multiple times within the same query

Only once within the parent query

Across multiple queries

Recursiveness

Supports recursion

Not recursive

Can be recursive if based on recursive query

Persistence

Query execution only

Query execution only

Persists in database catalog

Performance

May cache results temporarily

Evaluated each reference

Evaluated each query

The comparison reveals that CTEs occupy a middle ground between subqueries and views, offering reusability within a single statement without the overhead of permanent database objects. This positioning makes CTEs particularly valuable for complex analytical queries that need intermediate results but don't justify creating permanent views.

CTEs provide superior readability compared to nested subqueries, especially when dealing with multiple levels of nesting or complex transformation logic. While subqueries can become difficult to understand when deeply nested, CTEs maintain clarity through named, sequential definitions that mirror logical thinking processes.

Views offer broader reusability across multiple queries but require database management overhead including permissions, documentation, and lifecycle management. CTEs provide similar logical organization benefits without the administrative complexity, making them ideal for ad-hoc analysis and exploratory data work.

What Are the Key Limitations When Using Common Table Expressions?

Understanding when not to use CTEs helps you make informed decisions about query optimization and maintainability. Several scenarios warrant alternative approaches that may provide better performance or functionality.

When query performance is paramount and readability is secondary, alternative approaches like temporary tables with indexes might provide better execution plans. CTEs cannot be indexed directly, which may result in full table scans where indexed temporary tables could provide more efficient access patterns.

If you are new to SQL and find CTE syntax complex, starting with simpler subquery patterns might provide a gentler learning curve. However, investing time in learning CTE syntax pays dividends in long-term query maintainability and collaboration effectiveness.

Avoid recursive CTEs with very large data volumes as they can be expensive to execute and may consume significant memory resources during iteration. For deep hierarchical traversals over millions of rows, consider alternative approaches like iterative procedural code or specialized graph databases that handle hierarchical data more efficiently.

CTEs cannot be referenced across multiple separate SQL statements, limiting their usefulness in scenarios where you need persistent intermediate results for multiple analytical operations. In such cases, temporary tables or materialized views might provide better functionality.

Some database platforms implement CTEs with optimization limitations that prevent efficient predicate pushdown or other query optimizations. Understanding your specific database platform's CTE implementation helps you make informed decisions about when to use alternative approaches.

Which Database Platforms Support Common Table Expressions?

Modern database platforms have embraced CTEs as essential functionality for complex analytical workloads. Understanding platform-specific support and limitations helps you write portable SQL code and choose appropriate implementation strategies.

  • PostgreSQL provides comprehensive CTE support including recursive CTEs and materialization control options
  • MySQL version 8.0 and later supports both non-recursive and recursive CTEs with full functionality
  • Oracle Database offers robust CTE implementation with advanced optimization features
  • SQLite includes basic CTE support suitable for lightweight analytical operations
  • SQL Server provides extensive CTE functionality with sophisticated optimization capabilities
  • MariaDB supports CTEs with performance optimizations that are distinct from those in MySQL, tailored to its own architecture
  • IBM Db2 includes enterprise-grade CTE implementation with advanced recursion controls

Platform-specific differences primarily affect recursive CTE syntax and optimization behavior. Some platforms require the RECURSIVE keyword for recursive CTEs, while others make it optional. Understanding these differences helps you write more portable code or optimize for specific database platforms.

Cloud data warehouses like Snowflake, BigQuery, and Redshift provide advanced CTE optimizations that can handle petabyte-scale analytical workloads efficiently. These platforms often include automatic optimization features that make CTEs perform well even in complex analytical scenarios.

Legacy database versions may have limited CTE support or performance limitations that affect large-scale analytical operations. When working with older database versions, testing CTE performance against alternative approaches helps ensure optimal query execution.

How Do Advanced CTE Patterns Support Modern Data Engineering?

Modern data engineering workflows have evolved beyond basic CTE implementations to support complex transformation patterns and real-time data processing. These advanced patterns leverage CTEs within sophisticated data architectures to enable more efficient and maintainable data pipelines.

Ephemeral Materialization in dbt

The rise of transformation tools like dbt has revolutionized how CTEs function in data modeling. Ephemeral materialization allows dbt models to compile into CTEs rather than physical tables, creating zero-footprint transformations that reduce storage costs while maintaining data lineage.

-- dbt model configured as ephemeral
{{ config(materialized='ephemeral') }}

WITH customer_metrics AS (
    SELECT 
        customer_id,
        SUM(order_value) AS lifetime_value,
        COUNT(*) AS order_count
    FROM {{ ref('orders') }}
    GROUP BY customer_id
)
SELECT * FROM customer_metrics

This pattern enables modular SQL development where CTEs serve as reusable components across multiple models, improving code maintainability and reducing redundancy in transformation logic. Data teams can build complex analytical models by composing simpler CTE-based building blocks, creating more testable and maintainable data transformation pipelines.

CTE-Based Incremental Loading

Modern data platforms utilize specialized CDC mechanisms, such as log-based capture or triggers, as the foundation for change data capture patterns, enabling efficient incremental data loading strategies. SQL constructs like CTEs can assist in processing modified records within this context, helping minimize computational overhead in incremental workflows.

WITH new_records AS (
    SELECT *
    FROM source_table
    WHERE updated_at > (SELECT MAX(updated_at) FROM target_table)
),
merged_data AS (
    SELECT * FROM target_table
    UNION ALL
    SELECT * FROM new_records
)
SELECT * FROM merged_data;

This pattern proves particularly effective in high-volume environments where full table refreshes become computationally expensive, enabling sub-minute data latencies for business-critical analytics. Organizations can maintain near real-time analytical capabilities while controlling computational costs through efficient incremental processing.

AI-Driven CTE Generation

Emerging AI-powered tools now automatically generate optimized CTE queries from natural language descriptions, reducing the time data engineers spend writing complex SQL. These systems analyze query patterns and automatically structure CTEs for optimal performance based on data characteristics and usage patterns.

The integration of AI in CTE development represents a significant shift toward automated query optimization, where intelligent systems can suggest CTE structures based on historical performance data and best practices. This automation helps democratize advanced SQL techniques while maintaining query performance and maintainability standards.

What Are Advanced Recursive CTE Optimization Techniques?

Recursive CTEs present unique optimization challenges that require specialized techniques for handling large-scale hierarchical data processing. Modern database platforms provide sophisticated controls for managing recursive query performance and resource consumption.

Recursion Depth Management

Database platforms implement various mechanisms for controlling recursion depth to prevent runaway queries and manage resource consumption effectively. SQL Server provides the OPTION (MAXRECURSION n) hint to limit iteration count, while PostgreSQL does not have a direct query or configuration option for recursion control in recursive CTEs.

WITH RECURSIVE org_hierarchy AS (
    -- Anchor member: top-level managers
    SELECT employee_id, manager_id, name, 0 as level
    FROM employees 
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member with level tracking
    SELECT e.employee_id, e.manager_id, e.name, oh.level + 1
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
    WHERE oh.level < 10  -- Explicit depth limitation
)
SELECT * FROM org_hierarchy
OPTION (MAXRECURSION 15);

The code as written is valid T-SQL for SQL Server, but the 'OPTION (MAXRECURSION 15)' query hint must be removed to be valid in PostgreSQL; the rest of the CTE structure works in both systems.

Effective recursion depth management involves both explicit query-level controls and database configuration parameters. Setting appropriate limits prevents resource exhaustion while ensuring legitimate hierarchical traversals can complete successfully.

Memory-Efficient Recursion Patterns

Large-scale recursive operations require careful attention to memory management and intermediate result set sizing. Optimized recursive CTEs employ filtering strategies that minimize working set size during each iteration, reducing memory pressure and improving performance.

Early filtering within recursive members reduces the size of intermediate result sets, preventing exponential memory growth during deep traversals. This technique proves particularly valuable when traversing large organizational hierarchies or product categorization trees where most paths don't require full traversal.

Termination condition optimization ensures recursive operations complete efficiently without unnecessary iterations. Well-designed recursive CTEs include explicit termination logic that stops recursion when business logic requirements are satisfied, rather than relying solely on empty result set termination.

Cycle Detection and Prevention

Hierarchical data often contains cycles or circular references that can cause infinite recursion if not properly handled. Some modern database systems, such as PostgreSQL (since version 14) and Oracle (with the CYCLE clause), provide built-in cycle detection mechanisms in recursive CTEs, but most systems require manual handling of cycles to prevent infinite loops and identify problematic data relationships.

WITH RECURSIVE hierarchy_with_cycle_detection AS (
    -- Anchor member with path tracking
    SELECT employee_id, manager_id, name, 
           ARRAY[employee_id] as path,
           false as is_cycle
    FROM employees 
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member with cycle detection
    SELECT e.employee_id, e.manager_id, e.name,
           h.path || e.employee_id,
           e.employee_id = ANY(h.path) as is_cycle
    FROM employees e
    INNER JOIN hierarchy_with_cycle_detection h ON e.manager_id = h.employee_id
    WHERE NOT h.is_cycle
)
SELECT * FROM hierarchy_with_cycle_detection;

The provided SQL code correctly detects cycles in employee-manager hierarchies on platforms that support array operations (such as PostgreSQL), but it will not work in standard SQL or most other database systems that lack array and 'ANY' support.

Path tracking enables detection of circular references by maintaining the traversal history for each recursive branch. When a recursive operation encounters a previously visited node, it can terminate that branch while continuing to process other paths, ensuring complete traversal without infinite loops.

How Do Database-Specific CTE Optimizations Impact Performance?

Different database management systems implement CTEs with varying optimization strategies, significantly affecting performance in large-scale operations. Understanding these platform-specific behaviors enables data engineers to make informed decisions about CTE implementation and optimization.

PostgreSQL Materialization Control

PostgreSQL offers explicit materialization control for Common Table Expressions (CTEs) using the 'MATERIALIZED' keyword, allowing for performance optimization in certain cases:

WITH MATERIALIZED sales_summary AS (
    SELECT region, SUM(revenue) AS total_revenue
    FROM sales_data
    GROUP BY region
)
SELECT * FROM sales_summary
WHERE total_revenue > 100000;

Materialized CTEs cache intermediate results, proving beneficial when CTEs are referenced multiple times within a query or when the intermediate computation is expensive relative to result set filtering. However, materialization can create optimization barriers that prevent predicate pushdown from outer queries, potentially limiting performance optimization opportunities.

The choice between materialized and non-materialized CTEs depends on specific query patterns and data characteristics. Single-reference CTEs with expensive computations benefit from materialization, while CTEs that enable predicate pushdown perform better without materialization.

Cloud Data Warehouse Optimizations

Modern cloud platforms like Snowflake and BigQuery implement sophisticated general query optimization strategies that automatically determine execution plans based on query complexity and data volume. Snowflake's query optimizer uses cost analysis for overall query planning, while BigQuery leverages techniques like predicate pushdown and partitioning at the table level to minimize I/O overhead.

These optimizations enable CTEs to operate efficiently at petabyte scales, with cloud platforms processing terabytes of data through CTE-based transformations in minutes rather than hours. Cloud-native optimization often includes automatic parallelization and distributed execution that traditional database platforms cannot match.

Advanced cloud platforms provide query execution insights that help data engineers understand CTE performance characteristics and optimization opportunities. These platforms often include automated tuning suggestions that recommend CTE restructuring for improved performance based on historical execution patterns.

Database-Specific Best Practices

Each database platform requires tailored optimization approaches based on its specific CTE implementation and query optimization capabilities:

  • PostgreSQL: Use NOT MATERIALIZED for single-reference CTEs over large datasets to enable predicate pushdown and reduce memory consumption
  • SQL Server: Configure parallel execution parameters to prevent premature parallelization in CTE-heavy workloads that can cause resource contention
  • MySQL: By default, rely on the optimizer to decide whether to materialize multiply-referenced CTEs for performance, using explicit materialization syntax only for advanced tuning when necessary.
  • Oracle: Apply INLINE hints to enable optimization across CTE boundaries and improve overall query execution efficiency

Understanding these platform-specific behaviors allows data engineers to optimize CTE performance based on their specific database environment and workload characteristics. Regular performance testing across different optimization strategies helps identify the most effective approaches for specific use cases and data volumes.

What Are the Key Performance Considerations for Common Table Expressions?

Avoid Unnecessary Use

If a subquery is referenced only once, using a CTE may add needless complexity without providing reusability benefits. Simple subqueries often perform better due to optimizer flexibility in execution plan generation, while CTEs may introduce materialization overhead that impacts performance in single-use scenarios.

Consider the computational complexity of CTE operations relative to the overall query performance requirements. Complex aggregations or joins within CTEs that are referenced multiple times benefit from CTE structure, while simple filtering operations may perform better as inline subqueries.

Ensure Database Support

Most modern RDBMS support CTEs, but some legacy systems have limitations or performance characteristics that affect large-scale analytical operations. MySQL versions prior to 8.0 lack CTE support entirely, while older PostgreSQL versions may have optimization limitations that impact complex analytical queries.

Verify recursive CTE support specifically, as some databases implement non-recursive CTEs but lack recursive functionality. Understanding platform-specific CTE capabilities helps you write portable code and choose appropriate implementation strategies for your specific database environment.

Complexity of Recursive Queries

For very complex recursion over large datasets, consider procedural extensions such as PL/SQL (Oracle) or T-SQL (SQL Server) that provide more sophisticated control flow and debugging capabilities. Recursive CTEs work well for moderate-depth hierarchies but may struggle with deep recursion or complex termination conditions that are better handled through procedural approaches.

Evaluate memory consumption patterns for recursive CTEs, as they maintain intermediate result sets during iteration that can consume significant resources. Deep hierarchical traversals over large datasets may require iterative approaches or specialized graph processing tools rather than recursive SQL constructs.

How Do Common Table Expressions Apply in Extract, Transform, Load Processes?

Extract, transform, and load (ETL) integrates data from multiple sources into a centralized destination for analytics and reporting. CTEs are especially helpful during the transform step by simplifying complex SQL transformations and enhancing query reusability across different data processing stages.

In an ETL pipeline, tools like Airbyte can handle data movement efficiently through automated extraction and loading processes. It offers 600+ pre-built connectors to extract and load data from diverse sources including databases, APIs, and SaaS applications. After loading data into your target warehouse, you can leverage CTEs for sophisticated in-warehouse transformations that prepare data for analytical consumption.

CTEs prove particularly valuable in ETL transformation stages that require multi-step data cleansing and enrichment operations. For example, you might use sequential CTEs to standardize data formats, deduplicate records, apply business logic transformations, and calculate derived metrics within a single, maintainable SQL statement.

To implement ETL programmatically, Airbyte provides the open-source Python library PyAirbyte, which lets you extract data with Airbyte connectors and load it into SQL stores such as PostgreSQL or Snowflake. This approach enables you to build custom ETL workflows that combine Airbyte's extensive connector ecosystem with CTE-based transformation logic for maximum flexibility.

Image 1: Airbyte

Key Airbyte Features

  • AI-powered Connector Builder – accelerates custom connector creation with intelligent suggestions
  • Streamlined GenAI Workflows – load semi-structured or unstructured data directly into vector-store destinations for AI applications
  • Change Data Capture (CDC) – incrementally replicates changes from source to destination, keeping them in sync with minimal latency
  • Deployment Flexibility – choose Self-managed, Cloud-hosted, or Hybrid deployment options based on security and compliance requirements

Airbyte's normalization processes often generate CTEs automatically during data transformation, particularly when handling incremental updates and slowly changing dimensions. This integration demonstrates how modern data platforms leverage CTEs as foundational components for reliable, maintainable data pipelines that scale with organizational growth.

Conclusion

Common Table Expressions in SQL are versatile tools that help you manage large queries by breaking them into manageable, logical components that improve readability and maintainability. They prove invaluable for data integration tasks, particularly for data transformation and cleaning operations that require sophisticated multi-step processing.

This guide has covered CTE types, benefits, syntax, advanced optimization techniques, and real-world use cases to help you decide when and how to implement them effectively. With these concepts and best practices, you can query and analyze data more effectively in your SQL workflows while building more maintainable and performant data processing pipelines.

CTEs represent a fundamental advancement in SQL capabilities, enabling data professionals to write more expressive, maintainable queries while supporting complex analytical operations that would be difficult to implement using traditional SQL constructs. As data volumes and complexity continue to grow, mastering CTE techniques becomes increasingly valuable for building scalable, reliable data processing systems.

Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program
The data movement infrastructure for the modern data teams.
Try a 14-day free trial