Article

Important Nodes of the Query Plan Tree in PostgreSQL

•

May 17, 2024

•

10 min read

The set of steps taken to retrieve the results of a query is called the query plan. Previous articles in this series show how to use the EXPLAIN command and how to read simple query plans.

The query plan is organized as a tree structure. Each individual step is a node in the tree. Each node represents a method (operation) that the database uses internally to process queries. These methods are the internal functions that PostgreSQL uses to execute an SQL query. This article briefly covers the most important nodes of the query plan tree. Broadly, there are three classes of nodes - scan nodes, join nodes, and auxiliary nodes.

Scan Nodes

Scan nodes are used to read the data stored in a table. There are 4 types of Scan Nodes:

Sequential Scan and Parallel Sequential Scan
Index-scan
Index-only scan
Bitmap scan

The planner chooses the right type of Scan Node depending on the query, the data, and the availability of suitable indices. The output resultset of a Scan Node is a set of rows from the scanned table. The article PostgreSQL Query Plans for Reading Tables discusses these nodes in detail along with practical examples.

Join Nodes

Join nodes are used to join two or more tables. There are three types of joining methods:

Nested Loop Join
Hash Join
Merge Join

The planner decides which Join Node to use based on the JOIN clause in the query, the size of the joined tables, and the availability of the right indices. The output resultset of a Join Node is a set of joined rows. The article PostgreSQL Query Plans for Joining Tables explains these nodes in detail and with practical examples.

Auxiliary Nodes

Auxiliary nodes include many different types of operations, such as sorting, imposing limits and uniqueness constraints, merging output of parallelly executed operations, and so on. Auxiliary nodes do not produce their own rows. They are responsible for further processing on the resultset output by other nodes, like Scan Nodes and Join Nodes.

Some of the commonly used Auxiliary nodes are:

Aggregation nodes

These nodes are used to aggregate data, for example, when the query has a GROUP BY clause. They are also used for aggregation operations like SUM, MAX, etc. There are different types of aggregation nodes:

Aggregate
HashAggregate
GroupAggregate
Unique

HashAggregate can handle unsorted data while GroupAggregate needs sorted data. The article PostgreSQL Query Plans for Aggregating Data discusses these nodes further with examples.

Sort Nodes

The planner uses Sorting nodes to sort rows. There are different types of sort nodes:

Quick sort
External Disk sort
Top-N Heapsort
Incremental sort

Quick sort and External Disk sort use the quicksort algorithm while Top-N heapsort uses the heapsort algorithm. The article Postgres Query Plans for Sorting Data explains the different types of sort nodes with examples.

Gather nodes

On multi-threaded systems, it can be helpful to spawn child processes and split the task (such as reading or sorting data) between them. In such cases, the output of each child process needs to be combined. Gather nodes are used to combine the output from parallel child processes. There are two types of gather nodes:

Gather Merge - Gather merge is used when the child processes return sorted data. Their individual outputs must be sorted while combining. The resultset of this node is sorted.
Gather - This is the default method, it is used when the output of the child processes is not sorted. This node doesn't return sorted rows.

Gather nodes never feature in isolation. The articles discussing query plans on reading data tables, query plans for aggregating data rows, and query plans for sorting data rows feature examples which use the Gather and Gather Merge nodes.

Limit nodes

These are used to impose a limit on the number of rows returned. Often, when only a few nodes are to be returned, the planner is able to optimize the query to use a more efficient method. The article on Query Plans for Joining Tables includes examples which feature the use of the Limit node.

Materialize

The Materialize node caches the result of an operation, so that it can be accessed quickly by the parent node. For example, one of the methods of joining two tables involves scanning the outer table row by row. Each row of the outer table is compared against all the rows of the inner table. In this case, the entire contents of the inner table are accessed repeatedly. Much like how a Solana RPC node may cache state data to reduce repeated ledger lookups, the Materialize node caches results to speed up the process. Thus, it is more efficient to cache the results of the operation that scans the inner table, so as to speed up the process. The Materialize node does that. The article on Query Plans for Joining Tables features the use of the Materialize node.

Memoize

The Memoize node is similar to the Materialize node discussed above. However, Memoize is specific to the Nested Loop Join method in the scenario that the inner table is scanned using an index scan. The equality condition (of the JOIN) is imposed while scanning using the index - this is called a parametrized index scan. Thus, performance can be improved if the result of this parametrized index scan can be cached. The Memoize node, which was introduced in Postgres 14, does this. The article on Query Plans for Joining Tables features the use of the Memoize node.

The next articles in he series feature all these nodes in action.

💡Suggested Read: PostgreSQL Use Cases

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program ->

The data movement infrastructure for the modern data teams.

Try a 14-day free trial

About the Author

Arun's interest in building new things from scratch led him to technology and to learning full stack development skills. He owes much of what he knows to well written documentation. He hopes to pay it forward by sharing his own learnings in the form of hands-on tutorials. His current interests are data analysis and AI/ML tools.