The set of steps taken to retrieve the results of a query is called the query plan. Previous articles in this series show how to use the EXPLAIN command and how to read simple query plans .
The query plan is organized as a tree structure. Each individual step is a node in the tree. Each node represents a method (operation) that the database uses internally to process queries. These methods are the internal functions that PostgreSQL uses to execute an SQL query. This article briefly covers the most important nodes of the query plan tree. Broadly, there are three classes of nodes - scan nodes, join nodes, and auxiliary nodes.
Scan Nodes Scan nodes are used to read the data stored in a table. There are 4 types of Scan Nodes:
Sequential Scan and Parallel Sequential Scan Index-scan Index-only scan Bitmap scan The planner chooses the right type of Scan Node depending on the query, the data, and the availability of suitable indices. The output resultset of a Scan Node is a set of rows from the scanned table. The article PostgreSQL Query Plans for Reading Tables discusses these nodes in detail along with practical examples.
Join Nodes Join nodes are used to join two or more tables. There are three types of joining methods:
Nested Loop Join Hash Join Merge Join The planner decides which Join Node to use based on the JOIN clause in the query, the size of the joined tables, and the availability of the right indices. The output resultset of a Join Node is a set of joined rows. The article PostgreSQL Query Plans for Joining Tables explains these nodes in detail and with practical examples.
Auxiliary Nodes Auxiliary nodes include many different types of operations, such as sorting, imposing limits and uniqueness constraints, merging output of parallelly executed operations, and so on. Auxiliary nodes do not produce their own rows. They are responsible for further processing on the resultset output by other nodes, like Scan Nodes and Join Nodes.
Some of the commonly used Auxiliary nodes are:
Aggregation nodes These nodes are used to aggregate data, for example, when the query has a GROUP BY clause. They are also used for aggregation operations like SUM, MAX, etc. There are different types of aggregation nodes:
Aggregate HashAggregate GroupAggregate Unique HashAggregate can handle unsorted data while GroupAggregate needs sorted data. The article PostgreSQL Query Plans for Aggregating Data discusses these nodes further with examples.
Sort Nodes The planner uses Sorting nodes to sort rows. There are different types of sort nodes:
Quick sort External Disk sort Top-N Heapsort Incremental sort Quick sort and External Disk sort use the quicksort algorithm while Top-N heapsort uses the heapsort algorithm. The article Postgres Query Plans for Sorting Data explains the different types of sort nodes with examples.
Gather nodes On multi-threaded systems, it can be helpful to spawn child processes and split the task (such as reading or sorting data) between them. In such cases, the output of each child process needs to be combined. Gather nodes are used to combine the output from parallel child processes. There are two types of gather nodes:
Gather Merge - Gather merge is used when the child processes return sorted data. Their individual outputs must be sorted while combining. The resultset of this node is sorted. Gather - This is the default method, it is used when the output of the child processes is not sorted. This node doesn't return sorted rows. Gather nodes never feature in isolation. The articles discussing query plans on reading data tables , query plans for aggregating data rows , and query plans for sorting data rows feature examples which use the Gather and Gather Merge nodes.
Limit nodes These are used to impose a limit on the number of rows returned. Often, when only a few nodes are to be returned, the planner is able to optimize the query to use a more efficient method. The article on Query Plans for Joining Tables includes examples which feature the use of the Limit node.
Materialize The Materialize node caches the result of an operation, so that it can be accessed quickly by the parent node. For example, one of the methods of joining two tables involves scanning the outer table row by row. Each row of the outer table is compared against all the rows of the inner table. In this case, the entire contents of the inner table are accessed repeatedly. Thus, it is more efficient to cache the results of the operation that scans the inner table, so as to speed up the process. The Materialize node does that. The article on Query Plans for Joining Tables features the use of the Materialize node.
Memoize The Memoize node is similar to the Materialize node discussed above. However, Memoize is specific to the Nested Loop Join method in the scenario that the inner table is scanned using an index scan. The equality condition (of the JOIN) is imposed while scanning using the index - this is called a parametrized index scan. Thus, performance can be improved if the result of this parametrized index scan can be cached. The Memoize node, which was introduced in Postgres 14, does this. The article on Query Plans for Joining Tables features the use of the Memoize node.
The next articles in he series feature all these nodes in action.
Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program ->