Common Table Expressions (CTE): Syntax, Types, & Examples

November 28, 2024
20 min read

You can use SQL statements to store, retrieve, manipulate, and analyze data efficiently. To write queries in a cleaner and more organized way in SQL, you can use Common Table Expressions. CTEs enable you to simplify complicated SQL queries in an understandable form. Here, you will learn about CTEs in detail, along with their syntax, use cases, and limitations, to organize your SQL code blocks and query data effectively.

What are Common Table Expressions in SQL?

CTE (Common Table Expression) in SQL

In SQL, CTE is a temporary result set that is used in the context of a larger query and exists for a short duration. It can be referenced within SELECT, INSERT, UPDATE, or DELETE statements. CTEs help you simplify complex queries by breaking them down into smaller parts. These fragments make long SQL statements more readable and understandable. However, you cannot store CTE results, as they disappear immediately after query execution.

Why use Common Table Expression in SQL?

CTEs enable you to effectively manage complex queries in long SQL codes. Some of the prominent benefits of using CTEs are as follows:

  • With the help of CTEs, you can label different blocks of code to understand the purpose of each part of long queries.
  • It facilitates the logical organization of SQL queries, making it easier for you to identify and resolve bugs in the code.
  • You can use the same CTE across multiple parts of a query without rewriting the same subquery repeatedly.

What is the Syntax Of Common Table Expressions (CTE)?

CTEs are defined using the keyword WITH followed by the name of that CTE and the query that defines the result set. The syntax for writing common table expressions is as follows:


WITH cte_name (column1, column2, ...) AS (
    SELECT ...
    FROM ...
    WHERE ...
)

Here, 

  • cte_name: The name of the CTE, which you can use to refer to the result set.
  • (column1, column2, ...): Name of the columns in the CTE.
  • SELECT ... FROM ... WHERE ...: The query that defines the CTE, including various SQL clauses.

Types of CTEs in Data Engineering

There are different types of CTE in data engineering, some of which are discussed below:

Non-recursive CTEs

Non-recursive CTE does not refer to itself repeatedly. It is usually used during transformations, filtering, or aggregation operations. The syntax for non-recursive CTE is as follows:


WITH cte_name AS (
   SELECT * 
   FROM table_name
   WHERE condition
)

SELECT * 
FROM cte_name
WHERE additional_condition;
  • WITH keyword initiates CTE
  • cte_name is the name of CTE.
  • (SELECT * FROM table_name WHERE condition) query defines the CTE’s content. It selects all columns from table_name where a specific condition is met.
  • SELECT * FROM cte_name is the main query that retrieves all columns from the CTE result set (cte_name)

Recursive CTEs

A recursive CTE refers to itself and allows repeated execution of queries, developing the result in stages. You can utilize these CTEs to query hierarchical or tree-structured data, such as organizational charts, product categories, or biological classification. Recursive CTEs work in repetitive loops and functions until the conditions are fulfilled. To stop this CTE, you need to use the WHERE clause in the code.

The syntax for recursive CTE is as follows:


WITH expression_name (column_list)
AS
(
    -- Anchor Member
    initial_query  
    
    -- Combine the all Result Set
    UNION ALL
    
    -- Recursive member that references expression_name.
    recursive_query  
)
-- references expression name
SELECT *
FROM  expression_name

Here,

  • Expression_name is the name of the CTE
  • ( column_list) is the list of columns within the result set generated by CTE. These are optional fields and should be specified if the columns are derived or calculated.
  • Anchor Member is the starting point for a recursive SQL query and defines the initial set of rows.
  • UNION ALL command helps you to combine the results of two or more SELECT statements from different tables.

Multiple CTEs

You can define multiple CTEs in a single query. During this process, each CTE can be related to the previous one, forming a chain of multiple CTEs. The syntax for multiple CTEs can be given as:


WITH cte1 AS (
    SELECT ...
),
cte2 AS (
    SELECT ...
    FROM cte1
),
cte3 AS (
    SELECT ...
    FROM cte2
)
SELECT *
FROM cte3;

Here, cte1, cte2, cte3, etc., are names of different CTEs.

Few CTE Use Case Examples

Some use cases of CTEs are as follows:

  • Recursive CTEs help you query hierarchical datasets like the business employee dataset.
  • You can use CTE as an alternative to creating a temporary view in the database.
  • CTE enables you to perform the same calculations multiple times within a query.

Common Table Expression Vs. Subquery Vs. Views

Here is a brief comparison of common table expressions, subqueries, and views to help you understand their differences.

Features

Common Table Expression

Subquery

Views

Definition

Temporary result set used within a larger query. 

Subqueries are nested within the larger queries.  Views are virtual tables that represent stored queries.
Naming Convention It is mandatory to name CTEs. It is not mandatory to name a subquery. It is mandatory to name views.
Syntax Statement WITH statement WHERE, FROM, HAVING Clauses CREATE VIEW statement
Reusability You can use CTEs multiple times within the same query. You can use subquery only once. You can utilize views across multiple queries.
Recursiveness They are recursive. They are not recursive. You can create recursive views from recursive queries.

When Not to Use Common Table Expression?

You should not use CTEs in the following scenarios: 

  • When you need to prioritize high-performance queries over code readability, you should avoid using CTEs.
  • If you are a beginner in SQL, you may find CTE syntax complex, which can lead to errors in data querying.
  • Avoid using recursive CTEs for larger data volumes, as they can be difficult to execute and manage. 

Data Platforms That Support CTEs

Some data platforms that support CTEs are as follows:

  • PostgreSQL
  • MySQL
  • Oracle Database
  • SQLite
  • SQL Server
  • MariaDB
  • IBM Db2

Are There Performance Considerations When Using CTEs?

Yes, there are several performance considerations that you need to keep in mind while using CTEs in the data engineering field. Here are some of the considerations:

Avoid Unnecessary Use

If you are using a subquery only once within a larger query, you should avoid using CTEs as they can make the query complicated. Alternatively, you can use the subquery directly within the main query.

Ensure Database Support

Most modern RDBMS support CTEs. However, some legacy systems, such as MySQL 5.7, do not allow you to execute CTEs. As a result, you need to find alternate ways to structure your queries white using such data systems.

Complexity of Recursive Queries

While CTEs help execute recursive queries effectively, you can better manage complex recursive operations using procedural languages. For this, you can use languages like PL/SQL (Oracle) or T-SQL (MS SQL Server). You should practice this, especially when querying large datasets.

Application of CTE in ETL Process

Extract, transform, and load, or ETL, is a prominent data engineering technique used for integrating data from multiple sources into a centralized destination. CTEs play a critical role during the transformation step by simplifying complex SQL queries and enhancing query reusability.

In an ETL pipeline, tools like Airbyte can be used to handle data movement processes. It offers 400+ pre-built connectors, which you can use to extract and load data from various sources into the desired destination. Following this, you can leverage CTEs to perform data modifications within the SQL environments.

To implement the ETL process programmatically, Airbyte offers an open-source Python library, PyAirbyte. This library offers a set of utilities that you can use in the Python ecosystem. With the help of PyAirbyte, you can extract data from a variety of sources using Airbyte connectors and load it to SQL caches like Postgres or Snowflake.

Airbyte

Some prominent features of Airbyte are as follows:

  • AI-powered Connector Builder: You can use Connector Builder with AI assistant to speed up the custom connector creation process. The AI assistant pre-fills the configuration fields and provides intelligent suggestions to fine-tune the process.
  • Streamline GenAI Workflows: While using Airbyte, you can directly load semi-structured and unstructured data into vector store destinations. This enables you to manage your GenAI workflows effectively.
  • Change Data Capture (CDC): The CDC feature allows you to incrementally capture changes made at the sources and replicate them in the destination system. This ensures data consistency by keeping the source and target system in synchronization.
  • Deployment Flexibility: You can opt for any of the three options to deploy Airbyte: Self-managed, Cloud-hosted, and Hybrid. In the Self-managed option, you can implement Airbyte on your local system or your own infrastructural setup. In the Cloud-hosted option, Airbyte manages all the infrastructural operations. 

Conclusion

Common table expressions in SQL are versatile features that help you manage large queries by breaking them into manageable parts. They can be immensely helpful while performing data integration, especially for data transformation and cleaning.

This blog explains how you can use CTEs in data engineering to organize and handle large SQL queries. It provides comprehensive information on use cases, benefits, and types of CTEs to aid you in deciding which CTE to use according to your requirements. You can use this guide to query and analyze data effectively using SQL.

Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program
The data movement infrastructure for the modern data teams.
Try a 14-day free trial