Common Table Expressions (CTE): Syntax, Types, & Examples
You can use SQL statements to store, retrieve, manipulate, and analyze data efficiently. To write queries in a cleaner and more organized way in SQL, you can use Common Table Expressions. CTEs enable you to simplify complicated SQL queries in an understandable form. Here, you will learn about CTEs in detail, along with their syntax, use cases, and limitations, to organize your SQL code blocks and query data effectively.
What Are Common Table Expressions in SQL?
In SQL, a CTE is a temporary result set that is used in the context of a larger query and exists for a short duration. It can be referenced within SELECT, INSERT, UPDATE, or DELETE statements. CTEs help you simplify complex queries by breaking them down into smaller parts. These fragments make long SQL statements more readable and understandable. However, you cannot store CTE results, as they disappear immediately after query execution.
Why Should You Use Common Table Expressions in SQL?
CTEs enable you to effectively manage complex queries in long SQL code. Some of the prominent benefits are:
- You can label different blocks of code to understand the purpose of each part of long queries.
- They facilitate the logical organization of SQL queries, making it easier to identify and resolve bugs.
- You can reference the same CTE across multiple parts of a query without rewriting the same sub-query repeatedly.
What Is the Syntax of a Common Table Expression?
CTEs are defined using the keyword WITH
, followed by the name of the CTE and the query that defines the result set:
WITH cte_name (column1, column2, ...) AS (
SELECT ...
FROM ...
WHERE ...
)
Where:
- cte_name – the name you give the CTE.
- (column1, column2, …) – names of the columns returned by the CTE (optional).
- SELECT … FROM … WHERE … – the query that produces the result set.
What Are the Different Types of CTEs in Data Engineering?
There are different types of CTEs in data engineering, some of which are discussed below.
Non-recursive CTEs
Non-recursive CTEs don't refer to themselves repeatedly. They are typically used during transformations, filtering, or aggregation operations.
WITH cte_name AS (
SELECT *
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name
WHERE additional_condition;
WITH
initiates the CTE.cte_name
is the name of the CTE.(SELECT * FROM table_name WHERE condition)
defines the CTE's content.SELECT * FROM cte_name
is the main query that retrieves data from the CTE.
Recursive CTEs
A recursive CTE refers to itself and allows repeated execution of queries, building the result in stages—useful for hierarchical data such as org charts or product categories.
WITH expression_name (column_list) AS (
-- Anchor member
initial_query
UNION ALL
-- Recursive member that references expression_name
recursive_query
)
SELECT *
FROM expression_name;
- expression_name – name of the CTE.
- (column_list) – optional list of columns in the result set.
- Anchor member – starting point of the recursion.
- UNION ALL – combines the result sets of the anchor and recursive members.
Multiple CTEs
You can define multiple CTEs in a single query, each one able to build on the previous:
WITH cte1 AS (
SELECT ...
),
cte2 AS (
SELECT ...
FROM cte1
),
cte3 AS (
SELECT ...
FROM cte2
)
SELECT *
FROM cte3;
Here, cte1, cte2, cte3, etc. are separate CTEs chained together.
How Do Advanced CTE Patterns Support Modern Data Engineering?
Modern data engineering workflows have evolved beyond basic CTE implementations to support complex transformation patterns and real-time data processing. These advanced patterns leverage CTEs within sophisticated data architectures to enable more efficient and maintainable data pipelines.
Ephemeral Materialization in dbt
The rise of transformation tools like dbt has revolutionized how CTEs function in data modeling. Ephemeral materialization allows dbt models to compile into CTEs rather than physical tables, creating zero-footprint transformations that reduce storage costs while maintaining data lineage.
-- dbt model configured as ephemeral
{{ config(materialized='ephemeral') }}
WITH customer_metrics AS (
SELECT
customer_id,
SUM(order_value) AS lifetime_value,
COUNT(*) AS order_count
FROM {{ ref('orders') }}
GROUP BY customer_id
)
SELECT * FROM customer_metrics
This pattern enables modular SQL development where CTEs serve as reusable components across multiple models, improving code maintainability and reducing redundancy in transformation logic.
CTE-Based Incremental Loading
Modern data platforms utilize CTEs as the foundation for change data capture patterns, enabling efficient incremental data loading strategies. This approach minimizes computational overhead by processing only modified records.
WITH new_records AS (
SELECT *
FROM source_table
WHERE updated_at > (SELECT MAX(updated_at) FROM target_table)
),
merged_data AS (
SELECT * FROM target_table
UNION ALL
SELECT * FROM new_records
)
SELECT * FROM merged_data
This pattern proves particularly effective in high-volume environments where full table refreshes become computationally expensive, enabling sub-minute data latencies for business-critical analytics.
AI-Driven CTE Generation
Emerging AI-powered tools now automatically generate optimized CTE queries from natural language descriptions, reducing the time data engineers spend writing complex SQL. These systems analyze query patterns and automatically structure CTEs for optimal performance.
The integration of AI in CTE development represents a significant shift toward automated query optimization, where intelligent systems can suggest CTE structures based on data characteristics and usage patterns.
What Are Common CTE Use-Case Examples?
- Recursive CTEs help query hierarchical datasets like employee org charts.
- CTEs serve as an alternative to creating temporary views in a database.
- A CTE lets you perform the same calculations multiple times within a single query without repetition.
How Do Common Table Expressions Compare to Subqueries and Views?
Feature | Common Table Expression | Subquery | View |
---|---|---|---|
Definition | Temporary result set used within a larger query | Query nested inside another query | Virtual table representing a stored query |
Naming | Mandatory | Optional | Mandatory |
Syntax | WITH statement |
WHERE , FROM , HAVING clauses |
CREATE VIEW statement |
Re-usability | Multiple times within the same query | Only once within the parent query | Across multiple queries |
Recursiveness | Supports recursion | Not recursive | Can be recursive if based on recursive query |
When Should You Not Use a CTE?
- When query performance is paramount and readability is secondary.
- If you are new to SQL and find CTE syntax complex.
- Avoid recursive CTEs with very large data volumes—they can be expensive to execute.
Which Data Platforms Support CTEs?
- PostgreSQL
- MySQL
- Oracle Database
- SQLite
- SQL Server
- MariaDB
- IBM Db2
How Do Database-Specific CTE Optimizations Impact Performance?
Different database management systems implement CTEs with varying optimization strategies, significantly affecting performance in large-scale operations. Understanding these platform-specific behaviors enables data engineers to make informed decisions about CTE implementation and optimization.
PostgreSQL Materialization Control
PostgreSQL offers explicit materialization control through the MATERIALIZED
and NOT MATERIALIZED
keywords, providing granular optimization capabilities:
WITH MATERIALIZED sales_summary AS (
SELECT region, SUM(revenue) AS total_revenue
FROM sales_data
GROUP BY region
)
SELECT * FROM sales_summary
WHERE total_revenue > 100000;
Materialized CTEs cache intermediate results, proving beneficial when CTEs are referenced multiple times within a query. However, materialization can create optimization barriers that prevent predicate pushdown from outer queries.
Cloud Data Warehouse Optimizations
Modern cloud platforms like Snowflake and BigQuery implement sophisticated CTE optimization strategies. Snowflake's query optimizer automatically determines optimal materialization strategies based on cost analysis, while BigQuery leverages predicate pushdown and dynamic partitioning to minimize I/O overhead.
These optimizations enable CTEs to operate efficiently at petabyte scales, with cloud platforms processing terabytes of data through CTE-based transformations in minutes rather than hours.
Database-Specific Best Practices
Each database platform requires tailored optimization approaches:
- PostgreSQL: Use
NOT MATERIALIZED
for single-reference CTEs over large datasets to enable predicate pushdown - SQL Server: Configure parallel execution parameters to prevent premature parallelization in CTE-heavy workloads
- MySQL: Leverage automatic materialization for multiply-referenced CTEs to improve performance
- Oracle: Apply
INLINE
hints to enable optimization across CTE boundaries
Understanding these platform-specific behaviors allows data engineers to optimize CTE performance based on their specific database environment and workload characteristics.
What Are the Key Performance Considerations for CTEs?
Avoid Unnecessary Use
If a subquery is referenced only once, using a CTE may add needless complexity.
Ensure Database Support
Most modern RDBMS support CTEs, but some legacy systems (e.g., MySQL 5.7) do not.
Complexity of Recursive Queries
For very complex recursion over large datasets, consider procedural extensions such as PL/SQL (Oracle) or T-SQL (SQL Server).
How Do CTEs Apply in the ETL Process?
Extract, transform, and load (ETL) integrates data from multiple sources into a centralized destination. CTEs are especially helpful during the transform step by simplifying complex SQL and enhancing query re-usability.
In an ETL pipeline, tools like Airbyte can handle data movement. It offers 400+ pre-built connectors to extract and load data. After loading, you can leverage CTEs for in-warehouse transformations.
To implement ETL programmatically, Airbyte provides the open-source Python library PyAirbyte, which lets you extract data with Airbyte connectors and load it into SQL stores such as Postgres or Snowflake.
Key Airbyte Features
- AI-powered Connector Builder – accelerates custom connector creation with intelligent suggestions.
- Streamlined GenAI Workflows – load semi-structured or unstructured data directly into vector-store destinations.
- Change Data Capture (CDC) – incrementally replicates changes from source to destination, keeping them in sync.
- Deployment Flexibility – choose Self-managed, Cloud-hosted, or Hybrid deployment options.
Conclusion
Common Table Expressions in SQL are versatile tools that help you manage large queries by breaking them into manageable parts. They are invaluable for data integration tasks, particularly for data transformation and cleaning.
This guide has covered CTE types, benefits, syntax, and use cases to help you decide when and how to use them. With these concepts, you can query and analyze data more effectively in your SQL workflows.