Common Table Expressions (CTE): Syntax, Types, & Examples

November 28, 2024
20 min read

Summarize with ChatGPT

You can use SQL statements to store, retrieve, manipulate, and analyze data efficiently. To write queries in a cleaner and more organized way in SQL, you can use Common Table Expressions. CTEs enable you to simplify complicated SQL queries in an understandable form. Here, you will learn about CTEs in detail, along with their syntax, use cases, and limitations, to organize your SQL code blocks and query data effectively.

What Are Common Table Expressions in SQL?

In SQL, a CTE is a temporary result set that is used in the context of a larger query and exists for a short duration. It can be referenced within SELECT, INSERT, UPDATE, or DELETE statements. CTEs help you simplify complex queries by breaking them down into smaller parts. These fragments make long SQL statements more readable and understandable. However, you cannot store CTE results, as they disappear immediately after query execution.

Why Should You Use Common Table Expressions in SQL?

CTEs enable you to effectively manage complex queries in long SQL code. Some of the prominent benefits are:

  • You can label different blocks of code to understand the purpose of each part of long queries.
  • They facilitate the logical organization of SQL queries, making it easier to identify and resolve bugs.
  • You can reference the same CTE across multiple parts of a query without rewriting the same sub-query repeatedly.

What Is the Syntax of a Common Table Expression?

CTEs are defined using the keyword WITH, followed by the name of the CTE and the query that defines the result set:

WITH cte_name (column1, column2, ...) AS (
    SELECT ...
    FROM ...
    WHERE ...
)

Where:

  • cte_name – the name you give the CTE.
  • (column1, column2, …) – names of the columns returned by the CTE (optional).
  • SELECT … FROM … WHERE … – the query that produces the result set.

What Are the Different Types of CTEs in Data Engineering?

There are different types of CTEs in data engineering, some of which are discussed below.

Non-recursive CTEs

Non-recursive CTEs don't refer to themselves repeatedly. They are typically used during transformations, filtering, or aggregation operations.

WITH cte_name AS (
   SELECT * 
   FROM table_name
   WHERE condition
)
SELECT * 
FROM cte_name
WHERE additional_condition;
  • WITH initiates the CTE.
  • cte_name is the name of the CTE.
  • (SELECT * FROM table_name WHERE condition) defines the CTE's content.
  • SELECT * FROM cte_name is the main query that retrieves data from the CTE.

Recursive CTEs

A recursive CTE refers to itself and allows repeated execution of queries, building the result in stages—useful for hierarchical data such as org charts or product categories.

WITH expression_name (column_list) AS (
    -- Anchor member
    initial_query  

    UNION ALL

    -- Recursive member that references expression_name
    recursive_query  
)
SELECT *
FROM expression_name;
  • expression_name – name of the CTE.
  • (column_list) – optional list of columns in the result set.
  • Anchor member – starting point of the recursion.
  • UNION ALL – combines the result sets of the anchor and recursive members.

Multiple CTEs

You can define multiple CTEs in a single query, each one able to build on the previous:

WITH cte1 AS (
    SELECT ...
),
cte2 AS (
    SELECT ...
    FROM cte1
),
cte3 AS (
    SELECT ...
    FROM cte2
)
SELECT *
FROM cte3;

Here, cte1, cte2, cte3, etc. are separate CTEs chained together.

How Do Advanced CTE Patterns Support Modern Data Engineering?

Modern data engineering workflows have evolved beyond basic CTE implementations to support complex transformation patterns and real-time data processing. These advanced patterns leverage CTEs within sophisticated data architectures to enable more efficient and maintainable data pipelines.

Ephemeral Materialization in dbt

The rise of transformation tools like dbt has revolutionized how CTEs function in data modeling. Ephemeral materialization allows dbt models to compile into CTEs rather than physical tables, creating zero-footprint transformations that reduce storage costs while maintaining data lineage.

-- dbt model configured as ephemeral
{{ config(materialized='ephemeral') }}

WITH customer_metrics AS (
    SELECT 
        customer_id,
        SUM(order_value) AS lifetime_value,
        COUNT(*) AS order_count
    FROM {{ ref('orders') }}
    GROUP BY customer_id
)
SELECT * FROM customer_metrics

This pattern enables modular SQL development where CTEs serve as reusable components across multiple models, improving code maintainability and reducing redundancy in transformation logic.

CTE-Based Incremental Loading

Modern data platforms utilize CTEs as the foundation for change data capture patterns, enabling efficient incremental data loading strategies. This approach minimizes computational overhead by processing only modified records.

WITH new_records AS (
    SELECT *
    FROM source_table
    WHERE updated_at > (SELECT MAX(updated_at) FROM target_table)
),
merged_data AS (
    SELECT * FROM target_table
    UNION ALL
    SELECT * FROM new_records
)
SELECT * FROM merged_data

This pattern proves particularly effective in high-volume environments where full table refreshes become computationally expensive, enabling sub-minute data latencies for business-critical analytics.

AI-Driven CTE Generation

Emerging AI-powered tools now automatically generate optimized CTE queries from natural language descriptions, reducing the time data engineers spend writing complex SQL. These systems analyze query patterns and automatically structure CTEs for optimal performance.

The integration of AI in CTE development represents a significant shift toward automated query optimization, where intelligent systems can suggest CTE structures based on data characteristics and usage patterns.

What Are Common CTE Use-Case Examples?

  • Recursive CTEs help query hierarchical datasets like employee org charts.
  • CTEs serve as an alternative to creating temporary views in a database.
  • A CTE lets you perform the same calculations multiple times within a single query without repetition.

How Do Common Table Expressions Compare to Subqueries and Views?

Feature Common Table Expression Subquery View
Definition Temporary result set used within a larger query Query nested inside another query Virtual table representing a stored query
Naming Mandatory Optional Mandatory
Syntax WITH statement WHERE, FROM, HAVING clauses CREATE VIEW statement
Re-usability Multiple times within the same query Only once within the parent query Across multiple queries
Recursiveness Supports recursion Not recursive Can be recursive if based on recursive query

When Should You Not Use a CTE?

  • When query performance is paramount and readability is secondary.
  • If you are new to SQL and find CTE syntax complex.
  • Avoid recursive CTEs with very large data volumes—they can be expensive to execute.

Which Data Platforms Support CTEs?

  • PostgreSQL
  • MySQL
  • Oracle Database
  • SQLite
  • SQL Server
  • MariaDB
  • IBM Db2

How Do Database-Specific CTE Optimizations Impact Performance?

Different database management systems implement CTEs with varying optimization strategies, significantly affecting performance in large-scale operations. Understanding these platform-specific behaviors enables data engineers to make informed decisions about CTE implementation and optimization.

PostgreSQL Materialization Control

PostgreSQL offers explicit materialization control through the MATERIALIZED and NOT MATERIALIZED keywords, providing granular optimization capabilities:

WITH MATERIALIZED sales_summary AS (
    SELECT region, SUM(revenue) AS total_revenue
    FROM sales_data
    GROUP BY region
)
SELECT * FROM sales_summary
WHERE total_revenue > 100000;

Materialized CTEs cache intermediate results, proving beneficial when CTEs are referenced multiple times within a query. However, materialization can create optimization barriers that prevent predicate pushdown from outer queries.

Cloud Data Warehouse Optimizations

Modern cloud platforms like Snowflake and BigQuery implement sophisticated CTE optimization strategies. Snowflake's query optimizer automatically determines optimal materialization strategies based on cost analysis, while BigQuery leverages predicate pushdown and dynamic partitioning to minimize I/O overhead.

These optimizations enable CTEs to operate efficiently at petabyte scales, with cloud platforms processing terabytes of data through CTE-based transformations in minutes rather than hours.

Database-Specific Best Practices

Each database platform requires tailored optimization approaches:

  • PostgreSQL: Use NOT MATERIALIZED for single-reference CTEs over large datasets to enable predicate pushdown
  • SQL Server: Configure parallel execution parameters to prevent premature parallelization in CTE-heavy workloads
  • MySQL: Leverage automatic materialization for multiply-referenced CTEs to improve performance
  • Oracle: Apply INLINE hints to enable optimization across CTE boundaries

Understanding these platform-specific behaviors allows data engineers to optimize CTE performance based on their specific database environment and workload characteristics.

What Are the Key Performance Considerations for CTEs?

Avoid Unnecessary Use

If a subquery is referenced only once, using a CTE may add needless complexity.

Ensure Database Support

Most modern RDBMS support CTEs, but some legacy systems (e.g., MySQL 5.7) do not.

Complexity of Recursive Queries

For very complex recursion over large datasets, consider procedural extensions such as PL/SQL (Oracle) or T-SQL (SQL Server).

How Do CTEs Apply in the ETL Process?

Extract, transform, and load (ETL) integrates data from multiple sources into a centralized destination. CTEs are especially helpful during the transform step by simplifying complex SQL and enhancing query re-usability.

In an ETL pipeline, tools like Airbyte can handle data movement. It offers 400+ pre-built connectors to extract and load data. After loading, you can leverage CTEs for in-warehouse transformations.

To implement ETL programmatically, Airbyte provides the open-source Python library PyAirbyte, which lets you extract data with Airbyte connectors and load it into SQL stores such as Postgres or Snowflake.

Airbyte

Key Airbyte Features

  • AI-powered Connector Builder – accelerates custom connector creation with intelligent suggestions.
  • Streamlined GenAI Workflows – load semi-structured or unstructured data directly into vector-store destinations.
  • Change Data Capture (CDC) – incrementally replicates changes from source to destination, keeping them in sync.
  • Deployment Flexibility – choose Self-managed, Cloud-hosted, or Hybrid deployment options.

Conclusion

Common Table Expressions in SQL are versatile tools that help you manage large queries by breaking them into manageable parts. They are invaluable for data integration tasks, particularly for data transformation and cleaning.

This guide has covered CTE types, benefits, syntax, and use cases to help you decide when and how to use them. With these concepts, you can query and analyze data more effectively in your SQL workflows.

Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program
The data movement infrastructure for the modern data teams.
Try a 14-day free trial