Article

How to Read PostgreSQL Query Plans

•

May 16, 2024

•

10 min read

The previous article in this series discussed the core concepts of query plans and showed how to start using the EXPLAIN commands. The output of the EXPLAIN command can be somewhat cryptic.

This article introduces how to read and understand the output of the EXPLAIN commands. It does this with a few simple examples.

Test Database

To try things in practice, it is always advantageous to use a real-life database. All the articles in this series use the flight bookings database publicly available from PostgresPro.

Download the medium sized demo database with 3-months flight data from PostgresPro.

$ wget https://edu.postgrespro.com/demo-medium-en.zip

Extract the zip file, which is around 60 MB in size:

$ unzip demo-medium-en.zip

It extracts a file "demo-medium-en-20170815.sql". This file is a script to create a new database and populate it. It creates a new database "demo". Ensure you do not currently have any useful data in a database with the name "demo" - it will be dropped.

Using the Linux command-line, import the SQL script file into Postgres:

$ psql -f demo-medium-en-20170815.sql -U postgres

It is a large dump file, around 250 MB in size (after extracting). Running the script to populate the database can take a minute. After it is done, you should have a database named “demo”. Connect to it using the PSQL command line:

=# \c demo

After connecting to the database, check the summary description of all the tables in PSQL:

demo=# \d

                   List of relations
  Schema  |         Name          |   Type   |  Owner   
----------+-----------------------+----------+----------
 bookings | aircrafts             | view     | postgres
 bookings | aircrafts_data        | table    | postgres
 bookings | airports              | view     | postgres
 bookings | airports_data         | table    | postgres
 bookings | boarding_passes       | table    | postgres
 bookings | bookings              | table    | postgres
 bookings | flights               | table    | postgres
 bookings | flights_flight_id_seq | sequence | postgres
 bookings | flights_v             | view     | postgres
 bookings | routes                | view     | postgres

 bookings | seats                 | table    | postgres
 bookings | ticket_flights        | table    | postgres
 bookings | tickets               | table    | postgres

Many of the tables and their data should be self-explanatory. Below is a quick overview of the individual tables and views which are used in queries on this database:

‘aircrafts’ - This is a view based on the ‘aircrafts_data’ table (not to be used directly). It shows information like the model name, code, and flight range of different aircrafts.
‘airports’ - This view has information like airport code, name of the city, etc. of different airports. It is based on the table ‘airports_data’, which is not to be accessed directly.
‘seats’ - This is a list of seat numbers and the class of each seat on each kind of aircraft.
‘routes` - This table stores the names/codes of the arrival and departure airports, the flight number, flight duration, and flight dates of each route.
‘flights’ - This is a listing of individual flights with information about their scheduled departure and arrival times, departure and arrival airports, flight status, aircraft code, and so on.
‘bookings’ - It has the booking reference, the booking data, and the total (fare) amount of each booking. There can be multiple tickets in a booking.
‘tickets’ - This stores the ticket number, booking reference and passenger details on each ticket.
‘ticket_flights’ - This table has the ticket number, flight id, and fare amount for each flight in each ticket. Note that a ticket can consist of multiple flights.
‘boarding_passes’ - This table has the ticket number, flight id, boarding number, and seat number of all the boarding passes. There can be multiple boarding passes for each ticket.

To see the columns, indices, and foreign keys on a specific table from the command line:

demo=# \d ticket_flights ;

Also take a look at some rows of the individual tables to get a better idea:

demo=# SELECT * FROM seats ORDER BY random() LIMIT 5 ;

To know more, check the database's schema diagram and read the description of the tables and other objects. The examples in this article-series use queries on the tables in this database.

Introduction to Reading Query Plans

This section presents 3 examples to explain in-practice how to read the query plan for simple queries on the test database. The EXPLAIN command can be run by itself, or with options like ANALYZE, BUFFERS, etc.

EXPLAIN returns the estimated cost of executing the query and the individual steps in the execution path.
EXPLAIN ANALYZE executes the query. In addition to the estimated cost and the plan, it also shows the actual execution time.
EXPLAIN (ANALYZE, BUFFERS) shows the amount of data retrieved from cache and the amount that had to be read from disk (in addition to estimated cost and execution times).

To use an EXPLAIN command at the PSQL command line, prepend it to the SQL query. For example:

=# -- pseudocode EXPLAIN SELECT * FROM TABLENAME ;

=# -- pseudocode EXPLAIN ANALYZE SELECT * FROM TABLENAME ;

=# -- pseudocode EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM TABLENAME ;

The following examples illustrate how to use these commands and how to read their output. The first example is based on a trivial query, subsequent examples build on top of earlier examples. Each example focuses on a different aspect of the query plan.

EXPLAIN

When run by itself, the EXPLAIN command accepts a SQL query as argument and returns:

The query plan. This is the sequence of internal operations necessary to execute the query
The estimated cost of executing the query. This cost is denoted in arbitrary internal units, it is based on the underlying operations that need to be performed to run the query. It cannot be directly translated to time units.

Example 1

To start with, run the EXPLAIN command on a trivial query to select everything from a table in the test database:

=# EXPLAIN SELECT * FROM bookings ;

The body of the output looks like this:

"Seq Scan on bookings  (cost=0.00..9714.33 rows=593433 width=21)"

This query plan has only one step (node). It is explained below:

Seq Scan on bookings - The query performs a sequential scan on the entire table. Sequential scanning is like a brute force approach to getting all the data from a table. The planner can also choose to use an index, if a suitable index is available, and if using it is efficient.
cost=0.00..9714.33 - The two numbers, separated by the double dots, following the “cost =” term indicate the estimated startup cost and the estimated total cost. The planner makes a heuristic prediction based on randomly sampled table statistics. some text
- The estimated startup cost is 0. The startup cost is the processing time needed before that node can start returning rows - based on the operation of this node. A sequential scan starts returning rows as soon as it starts scanning the data.
- The estimated execution cost is 9714. This is a relative number based on internal planner metrics. It is difficult to translate this number to execution time in microseconds. However, execution costs of different queries can be directly compared to determine which query is expected to run faster.
- The execution cost of a parent node includes the execution cost of its child nodes.
rows=593433 - The planner estimates this operation to return 593433 rows. some text
- The planner’s estimate of the number of rows to be returned is based on table statistics derived from random sampling.
width=21 - The width represents the average amount of data in each row.

EXPLAIN ANALYZE

In addition to doing the same thing as EXPLAIN, the EXPLAIN ANALYZE command also executes the query and shows the actual execution time (in milliseconds).

Example 2

After the trivial query of the previous example, use the EXPLAIN command on a simple query on a single table:

=# EXPLAIN ANALYZE SELECT * FROM bookings WHERE total_amount < 4000 ;

The output looks like this:

"Gather  (cost=1000.00..7914.90 rows=441 width=21) (actual time=0.899..36.049 rows=257 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  ->  Parallel Seq Scan on bookings  (cost=0.00..6870.80 rows=184 width=21) (actual time=0.623..31.386 rows=86 loops=3)"
"        Filter: (total_amount < '4000'::numeric)"
"        Rows Removed by Filter: 197725"
"Planning Time: 0.077 ms"
"Execution Time: 36.100 ms"

Notice that the query plan is indented and organized in stages. This is the query plan tree. Each stage is called a node in the query plan tree.

The topmost node of the tree represents the final operation before returning results.
Child nodes start with the -> symbol.
The bottommost node is typically a scan node - it involves fetching data from the table (with or without using indexes) before filtering for the relevant information.

In the example query plan above:

The topmost node is a Gather node. This node is used to combine the output of parallel processes in the nodes below it. In this case, the operations below the Gather node are performed in parallel using 2 worker processes.

"Gather  (cost=1000.00..7914.90 rows=441 width=21)"
"  Workers Planned: 2"

To start returning rows, this (parent) node depends on the child nodes to finish and share their output.
Internally, planning and launching parallel worker processes has a fixed cost of 1000. Hence, the Gather node has a startup cost of 1000.
Gather can start combining (the output of the parallel workers) and returning rows as soon as it starts receiving them.
The gather node has an execution cost of 7915.
The second node performed a Sequential Scan in parallel. This sequentially reads rows from the table. Each parallel worker is assigned different (contiguous) parts of the data to scan.

"  ->  Parallel Seq Scan on bookings  (cost=0.00..6870.80 rows=184 width=21)"

Because there are no nodes below this node, it is not dependent on any other operations to finish before it can start.
Sequential scan starts returning rows as soon as it reads them. Thus, the startup cost is 0.00.
The execution cost is 6870.
Notice that the execution cost of the parent node (Gather) is 7915 - which is higher than the execution cost of this node (6870). The sequential scan needs to have finished before the gather node can return the final row.
While scanning, it uses filters to apply the conditions specified by the WHERE clauses. In this case, there is one condition in the WHERE clause. So, the filter applied is:

"        Filter: (total_amount < '4000'::numeric)"

EXPLAIN (ANALYZE, BUFFERS)

To use more than one option with EXPLAIN, enclose the options in parentheses. The BUFFERS option shows information of how many pages are read from cache, how many are read from disk, and if any pages are written to disk.

Reading from disk is slower than reading pre-fetched data from the cache. PostgreSQL caches the results of commonly used previous queries. However, the cache is limited in size. Thus, caching the result of a new query can mean having to erase (overwrite) cached results from earlier queries.

Example 3

Start with the query of the previous example, and add the BUFFERS parameter to the EXPLAIN command:

=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM bookings WHERE total_amount < 4000 ;

Structurally, the output resembles that of the previous query, but there is some additional information (highlighted).

"Gather  (cost=1000.00..7916.50 rows=457 width=21) (actual time=0.941..31.852 rows=257 loops=1)"
"  Workers Planned: 2"
"  Workers Launched: 2"
"  Buffers: shared hit=3780"
"  ->  Parallel Seq Scan on bookings  (cost=0.00..6870.80 rows=190 width=21) (actual time=0.392..27.054 rows=86 loops=3)"
        Filter: (total_amount < '4000'::numeric)"
"        Rows Removed by Filter: 197725"
"        Buffers: shared hit=3780"
"Planning Time: 0.083 ms"
"Execution Time: 31.910 ms"

The highlighted line indicates that the query was able to read 3780 pages from the cache and did not have to look up the data on the disk. As a further example, consider this excerpt from the plan of another query:

"        Buffers: shared hit=32 read=13897, temp written=10814"

The above line indicates:

The query read 32 pages from cache,
It had to look up 13897 pages from the disk,
It wrote 10814 pages from memory to the disk, so as to free up memory space for new pages.

The more pages a query is able to retrieve from cache, the faster it runs. Commonly looked up query results are cached, so that they return faster.

Understanding Startup and Execution Costs

As explained earlier, the startup cost is the processing time needed before that node can start returning rows. A scan node can start returning rows as soon as it scans the data. So, it has a 0 startup cost. A hash node needs time to generate the hash. So, it has a non-zero startup cost.

Execution cost is the cost for a node to finish execution - return all the rows and terminate the operation. The execution cost of a parent node includes the execution cost of its child nodes.

It can be confusing to figure out how startup and execution costs of a parent node depend on the costs of the child node. These guidelines can help understand it better:

If a parent node needs that the child node finish execution before it (the parent) can even start, the startup cost of the parent is higher than (or equal to) the execution cost of the child. This is the case with, for example, sort nodes, which need the entire dataset before they can start sorting.
If a parent node can start returning rows almost as soon as it starts receiving the output of the child nodes, the startup cost of the parent is higher than (or equal to) the startup cost of the child. This is the case with, for example, the nested loop join, which can start returning joined rows as soon as it receives matching rows from the tables to be joined.

Conclusion

This article introduced how to read and interpret simple query plans. In the simple queries used in the above examples, the planner used only 2 types of nodes - “gather” and (parallel) “sequential scan”. Later articles in the series dive deeper into query plans for readingdata, joining tables, aggregating data, and sorting data. These plans use many more types of nodes beyond gather and sequential scan. The next article in the series gives an overview of the most commonly used nodes.

💡Suggested Read:

‍PostgreSQL Query Plans for Aggregating Data

PostgreSQL Query Plans for Sorting Data

PostgreSQL Use Cases

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program ->

The data movement infrastructure for the modern data teams.

Try a 14-day free trial

About the Author

Arun's interest in building new things from scratch led him to technology and to learning full stack development skills. He owes much of what he knows to well written documentation. He hopes to pay it forward by sharing his own learnings in the form of hands-on tutorials. His current interests are data analysis and AI/ML tools.