Article

PostgreSQL Query Plans for Sorting Data

•

May 31, 2024

•

15 min read

The PostgreSQL query planner uses Sort nodes to handle sorting operations. Sorting is necessary when the input query has an ORDER BY clause. It is also necessary when intermediate resultsets need to be sorted to be input to a node which expects sorted input - such as the GroupAggregate node.

This article discusses nodes of the query plan tree which handle sorting. It is expected that you are already familiar with the more preliminary concepts, such as:

Sorting works on rows generated by other primary nodes, such as SELECT and JOIN. Thus, it is helpful to be familiar with:

These concepts are discussed and illustrated in the previous articles of this series.

Sorting Nodes

Sorting is not a standalone operation, it does not generate rows by itself. Only rows generated by another operation, such as SELECT or JOIN, can be sorted. Thus, Sort nodes are considered auxiliary nodes. There are different types of Sort nodes as discussed below.

Sort

The default node for handling any sorting operation is the Sort node. Depending on the data to be sorted, the availability of appropriate indices, the expected memory requirements of the sorting operation, and the amount of working memory available, Sort chooses one of a few different algorithms.

Quicksort

This is the standard method of sorting - it uses the quicksort algorithm.

The sort node receives the resultset of the child node, such as a join or a scan node
It loads the entire dataset into memory
It sorts it using the quicksort algorithm

This is possible only when the resultset can fit into the working memory. Because it is a memory-only operation, it is considered efficient. Thus, increasing work_mem increases the chances of using this method.

External Merge Sort

Sorting is a memory-intensive operation. In principle, the entire dataset being sorted needs to fit into memory. However, this is often not the case - the dataset is often too large to fit in working memory. When the amount of available memory cannot completely accommodate the dataset, it becomes necessary to use the disk. This is called External Merge Sort. Internally, it still uses the same quicksort algorithm, but:

Splits the data into batches
Fetches consecutive batches from disk
Sorts these batches individually
And finally combines the sorted batches.

Top-N Heap Sort

This method of sorting is used when only a part (N rows) of the sorted output is to be returned. It creates a heap data structure in memory and uses it to keep track of the top N (sorted) rows. To use this method, it is important that the top N rows fit into memory. If they don’t, a different sorting method is used.

Incremental Sort

Incremental Sort is a new node, available in newer versions (14+) of PostgreSQL. It handles cases when the sort operation receives input rows that are already partially sorted. This can be the case if the child node of the sort node is a Gather Merge or an Index Scan.

For example, consider a situation where the Sort node needs to sort the data by two columns, and the child node has already sorted it by the first column. In such cases, re-sorting the entire dataset again by both columns is wasteful. So, the planner chooses to use the Incremental Sort node instead of the regular Sort node. This takes the resultset (which is already sorted based on the first column), and sorts it further only according to the second column.

The following sections illustrate the concepts discussed above with hands-on examples. These examples are based on the demo flight tickets mid-sized database. Other articles in this series also use the same database. Follow the instructions in the introductory article on reading query plans to connect to this database and try out the examples in the following sections.

Sort

Depending on the type of query and the available memory, the Sort node chooses which method to use. The following sections show different methods in practice.

External Merge Sort

It commonly happens that the dataset is relatively large, and the working memory insufficient. In such situations, the Sort node uses the External Merge Sort method.

Example 1

Consider a simple sorting operation:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount ;

To process this query:

It first does a parallel sequential scan on the contents of the table “ticket_flights”.
Each parallel worker individually sorts its resultset.some text
1. This sorting is done using the “External Merge” method. Notice that each worker uses nearly 3.3 MB of disk space.
The sorted resultsets are combined using the topmost Gather Merge node which preserves the sort order in the final output.

The execution plan corresponding to this is below:

"Gather Merge  (cost=151927.82..381420.88 rows=1966946 width=32) (actual time=767.816..1925.619 rows=2360335 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Sort  (cost=150927.80..153386.48 rows=983473 width=32) (actual time=737.586..963.688 rows=786778 loops=3)"
"        Sort Key: amount"
"        Sort Method: external merge  Disk: 35472kB"
"        Worker 0:  Sort Method: external merge  Disk: 33520kB"
"        Worker 1:  Sort Method: external merge  Disk: 33696kB"
"        ->  Parallel Seq Scan on ticket_flights  (cost=0.00..29504.73 rows=983473 width=32) (actual time=0.013..110.702 rows=786778 loops=3)"
"Planning Time: 0.068 ms"
"Execution Time: 2062.364 ms"

Quicksort

If the available working memory is sufficient to accommodate the dataset to be sorted, the planner can directly use the Quicksort sorting method.

Effect of work_mem

To check this, increase the working memory to 1 GB from the default value of 4 MB:

‍=# SET work_mem = '1024 MB' ;

Verify that the value has changed:

=# SHOW work_mem ;

Example 2

Re-run the earlier query from Example 1:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount ;

The new execution plan looks like this:

"Sort  (cost=293121.42..299022.26 rows=2360335 width=32) (actual time=1323.414..1617.892 rows=2360335 loops=1)"
"  Sort Key: amount"
"  Sort Method: quicksort  Memory: 282706kB"
"  ->  Seq Scan on ticket_flights  (cost=0.00..43273.35 rows=2360335 width=32) (actual time=0.011..266.730 rows=2360335 loops=1)"
…
"Execution Time: 1727.954 ms"

Thus, after increasing the working memory:

It now does a sequential scan on the ticket_flights table to retrieve the data
It sorts the resultset using the Sort node and the quicksort algorithm.
Notice the total amount of memory consumed - nearly 282 MB. This is comfortably less than the (increased) working memory.

Thus, to improve sorting performance, consider increasing working memory so as to avoid spillover onto disk.

Before continuing, reset the working memory parameter to its original value:

‍=# RESET work_mem ;

Top-N Heapsort

Top-N Heapsort is used when only a limited number of rows are to be returned.

Example 3

To see it in practice, modify the previous query to return only the top 1000 rows:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount LIMIT 1000 ;

For this query:

The execution starts with a parallel sequential scan on the “ticket_flights” table.
Each parallel worker uses the top-N Heapsort method to sort its dataset.
The sorted output of each worker is combined by a Gather Merge node. This preserves the sort order.
Finally, the topmost Limit node ensures that only the desired number of rows is returned.

The execution plan looks like this:

"Limit  (cost=84427.52..84544.19 rows=1000 width=32) (actual time=352.008..358.456 rows=1000 loops=1)"
"  ->  Gather Merge  (cost=84427.52..313920.57 rows=1966946 width=32) (actual time=352.007..358.367 rows=1000 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        ->  Sort  (cost=83427.49..85886.18 rows=983473 width=32) (actual time=327.346..327.447 rows=1000 loops=3)"
"              Sort Key: amount"
"              Sort Method: top-N heapsort  Memory: 172kB"
"              Worker 0:  Sort Method: top-N heapsort  Memory: 172kB"
"              Worker 1:  Sort Method: top-N heapsort  Memory: 169kB"
"              ->  Parallel Seq Scan on ticket_flights  (cost=0.00..29504.73 rows=983473 width=32) (actual time=0.030..116.448 rows=786778 loops=3)"
"Planning Time: 0.094 ms"
"Execution Time: 358.534 ms"

Two-column Sort

It is common to sort the rows of a table based on the values of two columns. For multi-column sorts, rows are initially sorted according to the first key (column), then according to the second key, and so on.

Example 4

To see multi column sorts in action, sort the rows of the table ‘ticket_flights’ by two columns - flight_id and amount:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount ;

The plan for this query is shown below:

"Gather Merge  (cost=151927.82..381420.88 rows=1966946 width=32) (actual time=843.617..1993.398 rows=2360335 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Sort  (cost=150927.80..153386.48 rows=983473 width=32) (actual time=796.806..1096.575 rows=786778 loops=3)"
"        Sort Key: flight_id, amount"
"        Sort Method: external merge  Disk: 29776kB"
"        Worker 0:  Sort Method: external merge  Disk: 35368kB"
"        Worker 1:  Sort Method: external merge  Disk: 37560kB"
"        ->  Parallel Seq Scan on ticket_flights  (cost=0.00..29504.73 rows=983473 width=32) (actual time=0.014..106.871 rows=786778 loops=3)"
"Planning Time: 0.084 ms"
"Execution Time: 2160.181 ms"

Starting from the bottommost leaf nodes, the steps of the query plan tree are:

The planner reads the data from the table using a (parallel) sequential scan.
Each worker separately sorts its resultset from the (parallel) sequential scan. some text
1. Because the available working memory doesn’t fit the entire resultset, quicksort spills over onto the disk - hence the planner uses External Merge sort.
Finally, a Gather Merge node (which merges sorted data) combines the resultsets of the parallel workers.

Compare this plan to the plan in Example 1 where the planner used a External Merge sort to sort according to a single column. Observe that:

The only difference is the sort key (highlighted in green), which is now two columns.
The cost of the plan is identical in both cases.

Effect of Indexing

In the previous example, the planner used a sequential scan followed by a Sort node. It could be more efficient to use an index scan instead.

Example 5

To study the effect of indexing on sort performance, create a new index on both the columns used as sort keys in Example 4:

=# CREATE INDEX ticket_flights_flightid_amount ON ticket_flights (flight_id, amount) ;

CREATE INDEX

=# VACUUM ANALYZE ticket_flights ;

Re-run the same query as in Example 4:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount ;

In the query plan below, notice that:

The new index has already taken care of sorting the rows according to both the indexed columns.
So, there is no need to sort the resultset again.
Accordingly, the planner does just an index scan and returns the result, as shown in the plan below:

"Index Scan using ticket_flights_flightid_amount on ticket_flights  (cost=0.43..148981.46 rows=2360335 width=32) (actual time=0.153..1866.964 rows=2360335 loops=1)"
"Planning Time: 0.222 ms"
"Execution Time: 2010.230 ms"

Notice also that the cost of the query is now considerably less than earlier.

Sort Order

The query in the previous example sorted the rows of the table ticket_flights first according to the columns flight_id and then according to the column amount. This sort operation used the index ‘ticket_flights (flight_id, amount)’.

Example 6 - Exercise

Re-run the same query but with the sort order reversed. Instead of sorting by columns A and B, sort it according to columns B and A:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount, flight_id ;

Study the query plan and observe that:

Because the indexed columns no longer correspond exactly to the ORDER BY columns, it no longer does an index scan of the table, and instead does a (parallel) sequential scan.
The new query plan resembles the situation without an index:some text
1. Parallel workers sort the individual resultsets of the parallel sequential scans
2. A Gather Merge node combines the sorted outputs of the workers.
The cost of the query is also higher.

Thus, indexes help sorting performance, but only if they are the right type of index.

Incremental Sort

The query used in Examples 4 and 5 first sorts the table according to the values in the column ‘flight_id’ and then by the column ‘amount’:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount ;

Example 7

Add a third column, ‘ticket_no’, to the ORDER BY clause in the above query:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount, ticket_no ;

Study the resulting query plan and observe that:

The planner is aware that the data is to be sorted according to the flight_id and amount columns. There is an index based on those columns. So instead of sequentially scanning the table, it does an index scan.
The index scan produces rows that are sorted according to the index keys, ‘flight_id’ and ‘amount’.
So, it only needs to sort the rows according to the third newly added sort key - ‘ticket_no’.
This is accomplished using the Incremental Sort method. some text
1. Incremental sort, internally uses the quicksort algorithm.

The query plan is shown below:

"Incremental Sort  (cost=1.35..255196.95 rows=2360335 width=32) (actual time=0.372..5710.081 rows=2360335 loops=1)"
"  Sort Key: flight_id, amount, ticket_no"
"  Presorted Key: flight_id, amount"
"  Full-sort Groups: 33686  Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB"
"  Pre-sorted Groups: 39699  Sort Method: quicksort  Average Memory: 30kB  Peak Memory: 30kB"
"  ->  Index Scan using ticket_flights_flightid_amount on ticket_flights  (cost=0.43..148859.94 rows=2360335 width=32) (actual time=0.021..2306.928 rows=2360335 loops=1)"
"Planning Time: 0.075 ms"
"Execution Time: 5847.550 ms"

Example 8 - Exercise

Modify the query of the previous example slightly - change the order of the sort keys:

=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY ticket_no, flight_id, amount ;

The new query plan shows:

It now uses a different index - ‘ticket_flights_pkey’. This is the primary key index which is already part of this table. This index covers the ticket_no and flight_id columns.
Because a suitable index is already available for the modified query, the planner uses it to get presorted rows.
It then uses Incremental Sort to sort the presorted rows according to the remaining sort key - the amount column.

Example 9 - Exercise

Write a query selecting all columns (SELECT *) and involving ORDER BY on a table with many columns, such as ‘flights’. Rewrite the same query to restrict the output columns to a few specific columns instead of doing SELECT *. Compare the performance in both cases.
Then add indexes over the relevant columns in an attempt to get the planner to use an index-only scan.

Conclusion

This article discussed the key principles of how PostgreSQL sorts data. It discussed the nodes used to sort data and their use-cases. To summarize the learnings from the examples discussed, some ideas which can help sorting performance are:

Increasing working memory
Selecting only the needed columns
Having the right indexes
Limiting the number of rows
In case of multi-column sorts, being careful with the order of sort keys. ORDER BY 1, 2 can perform differently from ORDER BY 2, 1.

The next article in the series discusses general pointers for improving PostgreSQL query performance.

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program ->

The data movement infrastructure for the modern data teams.

Try a 14-day free trial

About the Author

Arun's interest in building new things from scratch led him to technology and to learning full stack development skills. He owes much of what he knows to well written documentation. He hopes to pay it forward by sharing his own learnings in the form of hands-on tutorials. His current interests are data analysis and AI/ML tools.