The PostgreSQL query planner uses Sort nodes to handle sorting operations. Sorting is necessary when the input query has an ORDER BY clause. It is also necessary when intermediate resultsets need to be sorted to be input to a node which expects sorted input - such as the GroupAggregate node.
This article discusses nodes of the query plan tree which handle sorting. It is expected that you are already familiar with the more preliminary concepts, such as:
Sorting works on rows generated by other primary nodes, such as SELECT and JOIN. Thus, it is helpful to be familiar with:
These concepts are discussed and illustrated in the previous articles of this series.
Sorting Nodes Sorting is not a standalone operation, it does not generate rows by itself. Only rows generated by another operation, such as SELECT or JOIN, can be sorted. Thus, Sort nodes are considered auxiliary nodes. There are different types of Sort nodes as discussed below.
Sort The default node for handling any sorting operation is the Sort node. Depending on the data to be sorted, the availability of appropriate indices, the expected memory requirements of the sorting operation, and the amount of working memory available, Sort chooses one of a few different algorithms.
Quicksort This is the standard method of sorting - it uses the quicksort algorithm.
The sort node receives the resultset of the child node, such as a join or a scan node It loads the entire dataset into memory It sorts it using the quicksort algorithm This is possible only when the resultset can fit into the working memory. Because it is a memory-only operation, it is considered efficient. Thus, increasing work_mem increases the chances of using this method.
External Merge Sort Sorting is a memory-intensive operation. In principle, the entire dataset being sorted needs to fit into memory. However, this is often not the case - the dataset is often too large to fit in working memory. When the amount of available memory cannot completely accommodate the dataset, it becomes necessary to use the disk. This is called External Merge Sort. Internally, it still uses the same quicksort algorithm, but:
Splits the data into batches Fetches consecutive batches from disk Sorts these batches individually And finally combines the sorted batches. Top-N Heap Sort This method of sorting is used when only a part (N rows) of the sorted output is to be returned. It creates a heap data structure in memory and uses it to keep track of the top N (sorted) rows. To use this method, it is important that the top N rows fit into memory. If they don’t, a different sorting method is used.
Incremental Sort Incremental Sort is a new node, available in newer versions (14+) of PostgreSQL. It handles cases when the sort operation receives input rows that are already partially sorted. This can be the case if the child node of the sort node is a Gather Merge or an Index Scan.
For example, consider a situation where the Sort node needs to sort the data by two columns, and the child node has already sorted it by the first column. In such cases, re-sorting the entire dataset again by both columns is wasteful. So, the planner chooses to use the Incremental Sort node instead of the regular Sort node. This takes the resultset (which is already sorted based on the first column), and sorts it further only according to the second column.
The following sections illustrate the concepts discussed above with hands-on examples. These examples are based on the demo flight tickets mid-sized database . Other articles in this series also use the same database. Follow the instructions in the introductory article on reading query plans to connect to this database and try out the examples in the following sections.
Sort Depending on the type of query and the available memory, the Sort node chooses which method to use. The following sections show different methods in practice.
External Merge Sort It commonly happens that the dataset is relatively large, and the working memory insufficient. In such situations, the Sort node uses the External Merge Sort method.
Example 1 Consider a simple sorting operation:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount ;
To process this query:
It first does a parallel sequential scan on the contents of the table “ticket_flights”. Each parallel worker individually sorts its resultset.some textThis sorting is done using the “External Merge” method. Notice that each worker uses nearly 3.3 MB of disk space. The sorted resultsets are combined using the topmost Gather Merge node which preserves the sort order in the final output. The execution plan corresponding to this is below:
"Gather Merge (cost=151927.82..381420.88 rows=1966946 width=32) (actual time=767.816..1925.619 rows=2360335 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=150927.80..153386.48 rows=983473 width=32) (actual time=737.586..963.688 rows=786778 loops=3)"
" Sort Key: amount"
" Sort Method: external merge Disk: 35472kB"
" Worker 0: Sort Method: external merge Disk: 33520kB"
" Worker 1: Sort Method: external merge Disk: 33696kB"
" -> Parallel Seq Scan on ticket_flights (cost=0.00..29504.73 rows=983473 width=32) (actual time=0.013..110.702 rows=786778 loops=3)"
"Planning Time: 0.068 ms"
"Execution Time: 2062.364 ms"
Quicksort If the available working memory is sufficient to accommodate the dataset to be sorted, the planner can directly use the Quicksort sorting method.
Effect of work_mem To check this, increase the working memory to 1 GB from the default value of 4 MB:
=# SET work_mem = '1024 MB' ;
Verify that the value has changed:
Example 2 Re-run the earlier query from Example 1:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount ;
The new execution plan looks like this:
"Sort (cost=293121.42..299022.26 rows=2360335 width=32) (actual time=1323.414..1617.892 rows=2360335 loops=1)"
" Sort Key: amount"
" Sort Method: quicksort Memory: 282706kB"
" -> Seq Scan on ticket_flights (cost=0.00..43273.35 rows=2360335 width=32) (actual time=0.011..266.730 rows=2360335 loops=1)"
…
"Execution Time: 1727.954 ms"
Thus, after increasing the working memory:
It now does a sequential scan on the ticket_flights table to retrieve the data It sorts the resultset using the Sort node and the quicksort algorithm. Notice the total amount of memory consumed - nearly 282 MB. This is comfortably less than the (increased) working memory. Thus, to improve sorting performance, consider increasing working memory so as to avoid spillover onto disk.
Before continuing, reset the working memory parameter to its original value:
Top-N Heapsort Top-N Heapsort is used when only a limited number of rows are to be returned.
Example 3 To see it in practice, modify the previous query to return only the top 1000 rows:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount LIMIT 1000 ;
For this query:
The execution starts with a parallel sequential scan on the “ticket_flights” table. Each parallel worker uses the top-N Heapsort method to sort its dataset. The sorted output of each worker is combined by a Gather Merge node. This preserves the sort order. Finally, the topmost Limit node ensures that only the desired number of rows is returned. The execution plan looks like this:
"Limit (cost=84427.52..84544.19 rows=1000 width=32) (actual time=352.008..358.456 rows=1000 loops=1)"
" -> Gather Merge (cost=84427.52..313920.57 rows=1966946 width=32) (actual time=352.007..358.367 rows=1000 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=83427.49..85886.18 rows=983473 width=32) (actual time=327.346..327.447 rows=1000 loops=3)"
" Sort Key: amount"
" Sort Method: top-N heapsort Memory: 172kB"
" Worker 0: Sort Method: top-N heapsort Memory: 172kB"
" Worker 1: Sort Method: top-N heapsort Memory: 169kB"
" -> Parallel Seq Scan on ticket_flights (cost=0.00..29504.73 rows=983473 width=32) (actual time=0.030..116.448 rows=786778 loops=3)"
"Planning Time: 0.094 ms"
"Execution Time: 358.534 ms"
Two-column Sort It is common to sort the rows of a table based on the values of two columns. For multi-column sorts, rows are initially sorted according to the first key (column), then according to the second key, and so on.
Example 4 To see multi column sorts in action, sort the rows of the table ‘ticket_flights’ by two columns - flight_id and amount:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount ;
The plan for this query is shown below:
"Gather Merge (cost=151927.82..381420.88 rows=1966946 width=32) (actual time=843.617..1993.398 rows=2360335 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=150927.80..153386.48 rows=983473 width=32) (actual time=796.806..1096.575 rows=786778 loops=3)"
" Sort Key: flight_id, amount"
" Sort Method: external merge Disk: 29776kB"
" Worker 0: Sort Method: external merge Disk: 35368kB"
" Worker 1: Sort Method: external merge Disk: 37560kB"
" -> Parallel Seq Scan on ticket_flights (cost=0.00..29504.73 rows=983473 width=32) (actual time=0.014..106.871 rows=786778 loops=3)"
"Planning Time: 0.084 ms"
"Execution Time: 2160.181 ms"
Starting from the bottommost leaf nodes, the steps of the query plan tree are:
The planner reads the data from the table using a (parallel) sequential scan. Each worker separately sorts its resultset from the (parallel) sequential scan. some textBecause the available working memory doesn’t fit the entire resultset, quicksort spills over onto the disk - hence the planner uses External Merge sort. Finally, a Gather Merge node (which merges sorted data) combines the resultsets of the parallel workers. Compare this plan to the plan in Example 1 where the planner used a External Merge sort to sort according to a single column. Observe that:
The only difference is the sort key (highlighted in green), which is now two columns. The cost of the plan is identical in both cases. Effect of Indexing In the previous example, the planner used a sequential scan followed by a Sort node. It could be more efficient to use an index scan instead.
Example 5 To study the effect of indexing on sort performance, create a new index on both the columns used as sort keys in Example 4:
=# CREATE INDEX ticket_flights_flightid_amount ON ticket_flights (flight_id, amount) ;
CREATE INDEX
=# VACUUM ANALYZE ticket_flights ;
Re-run the same query as in Example 4:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount ;
In the query plan below, notice that:
The new index has already taken care of sorting the rows according to both the indexed columns. So, there is no need to sort the resultset again. Accordingly, the planner does just an index scan and returns the result, as shown in the plan below: "Index Scan using ticket_flights_flightid_amount on ticket_flights (cost=0.43..148981.46 rows=2360335 width=32) (actual time=0.153..1866.964 rows=2360335 loops=1)"
"Planning Time: 0.222 ms"
"Execution Time: 2010.230 ms"
Notice also that the cost of the query is now considerably less than earlier.
Sort Order The query in the previous example sorted the rows of the table ticket_flights first according to the columns flight_id and then according to the column amount. This sort operation used the index ‘ticket_flights (flight_id, amount)’.
Example 6 - Exercise Re-run the same query but with the sort order reversed. Instead of sorting by columns A and B, sort it according to columns B and A:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY amount, flight_id ;
Study the query plan and observe that:
Because the indexed columns no longer correspond exactly to the ORDER BY columns, it no longer does an index scan of the table, and instead does a (parallel) sequential scan. The new query plan resembles the situation without an index:some textParallel workers sort the individual resultsets of the parallel sequential scans A Gather Merge node combines the sorted outputs of the workers. The cost of the query is also higher. Thus, indexes help sorting performance, but only if they are the right type of index.
Incremental Sort The query used in Examples 4 and 5 first sorts the table according to the values in the column ‘flight_id’ and then by the column ‘amount’:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount ;
Example 7 Add a third column, ‘ticket_no’, to the ORDER BY clause in the above query:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY flight_id, amount, ticket_no ;
Study the resulting query plan and observe that:
The planner is aware that the data is to be sorted according to the flight_id and amount columns. There is an index based on those columns. So instead of sequentially scanning the table, it does an index scan. The index scan produces rows that are sorted according to the index keys, ‘flight_id’ and ‘amount’. So, it only needs to sort the rows according to the third newly added sort key - ‘ticket_no’. This is accomplished using the Incremental Sort method. some textIncremental sort, internally uses the quicksort algorithm. The query plan is shown below:
"Incremental Sort (cost=1.35..255196.95 rows=2360335 width=32) (actual time=0.372..5710.081 rows=2360335 loops=1)"
" Sort Key: flight_id, amount, ticket_no"
" Presorted Key: flight_id, amount"
" Full-sort Groups: 33686 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB"
" Pre-sorted Groups: 39699 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB"
" -> Index Scan using ticket_flights_flightid_amount on ticket_flights (cost=0.43..148859.94 rows=2360335 width=32) (actual time=0.021..2306.928 rows=2360335 loops=1)"
"Planning Time: 0.075 ms"
"Execution Time: 5847.550 ms"
Example 8 - Exercise Modify the query of the previous example slightly - change the order of the sort keys:
=# EXPLAIN ANALYZE SELECT * FROM ticket_flights ORDER BY ticket_no, flight_id, amount ;
The new query plan shows:
It now uses a different index - ‘ticket_flights_pkey’. This is the primary key index which is already part of this table. This index covers the ticket_no and flight_id columns. Because a suitable index is already available for the modified query, the planner uses it to get presorted rows. It then uses Incremental Sort to sort the presorted rows according to the remaining sort key - the amount column. Example 9 - Exercise
Write a query selecting all columns (SELECT *) and involving ORDER BY on a table with many columns, such as ‘flights’. Rewrite the same query to restrict the output columns to a few specific columns instead of doing SELECT *. Compare the performance in both cases. Then add indexes over the relevant columns in an attempt to get the planner to use an index-only scan. Conclusion This article discussed the key principles of how PostgreSQL sorts data. It discussed the nodes used to sort data and their use-cases. To summarize the learnings from the examples discussed, some ideas which can help sorting performance are:
Increasing working memory Selecting only the needed columns Having the right indexes Limiting the number of rows In case of multi-column sorts, being careful with the order of sort keys. ORDER BY 1, 2 can perform differently from ORDER BY 2, 1. The next article in the series discusses general pointers for improving PostgreSQL query performance.