Used exclusively with the Select statement, the DISTINCT clause in PostgreSQL is used to remove duplicate rows, displaying only unique values to you.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
In today’s tech-driven world, data integrity and security are paramount. Selecting a strong relational database is not enough; you must also know how to interact with it to analyze your data correctly. Structured Query Language (SQL) allows you to run queries on your data through simple English-like statements and clauses. The DISTINCT clause is one of the important keywords that simplifies your search for one-of-a-kind value.
This article will educate you about the DISTINCT clause and how it is used in PostgreSQL, one of the well-known databases available today.
PostgreSQL, commonly referred to as Postgres, is an enterprise-grade object-relational database management system (ORDBMS). This open-source relational database was developed in 1986 at the University of California, Berkeley. It was a successor to the INGRES project and received integrated SQL support in 1994. Since then, Postgres has been recognized for its reliability, stability, and compliance measures.
PostgreSQL has a robust architecture that can accommodate both SQL (relational) and JSON (non-relational) queries. You can make use of a spectrum of SQL functions like triggers, sub-queries, foreign keys, and user-defined functions. All these functions and features help your business enormously when developing analytical applications for the web and mobile.
Since it is an open-source software, PostgreSQL remains free from corporate control and supports various operating systems. Some noteworthy ones include Windows, macOS, Linux, and UNIX. This integration eliminates the need for multiple licenses, thus reducing business costs. You can also gain access to PostgreSQL’s source code under the PostgreSQL License.
PostgreSQL holds a coveted position among the top databases globally. Take a look at some of the key features that set it apart:
The DISTINCT clause in PostgreSQL selectively retrieves unique records and discards duplicates from the database. When you apply this clause, Postgres will retain a single row from a collection of identical rows. Thus, you will be presented with distinctively unique values.
The PostgreSQL DISTINCT clause operates on both single and multiple columns in a single table. You can also use the clause on multiple tables connected to one another through JOIN statements. The PGSQL DISTINCT leverages the values from the specified columns to identify and exclude duplicates, effectively deduplicating the output. The clause is more effective when you use it for multiple columns, as it will consider a combination of values in several columns to filter out repetitive records.
There are two caveats that you must know about PostgreSQL DISTINCT. Firstly, the clause is solely compatible with the SELECT statement. And secondly, it does not exclude null values from your dataset. If your columns contain null values, they will also be counted when you use DISTINCT.
In Postgres, the DISTINCT clause follows a particular syntax:
Let’s understand the parameters in the syntax and what they mean:
There are a few different ways to use the Postgres DISTINCT clause. Take a look at three of the most used syntaxes:
In this syntax, the PostgreSQL DISTINCT clause is used for a single expression, which is a column in this case. Here, the clause will evaluate the duplicate rows in the specified column of your dataset’s table. The output will be all the unique values in the form of a list.
In this syntax, the PostgreSQL DISTINCT clause is used for multiple expressions, column_1 and column_2 here. You can keep more than two columns in the query for the clause to work. It will evaluate both the columns and scour for unique values within them. The end result will be a combination of the values, again presented in the form of a list.
The DISTINCT ON clause is a slight variation of the DISTINCT clause in PostgreSQL. It is also used exclusively with the SELECT statement and serves the same purpose of providing unique values from your dataset.
The DISTINCT ON solely retrieves the initial row from every set of duplicates in the dataset. However, the output may be displayed in a random order.
To organize your output for better understanding and readability, you can use the ORDER BY statement in conjunction with the DISTINCT ON clause. With ORDER BY CLAUSE, the output from the first rows of the duplicate group given by DISTINCT ON and the result from other fields within the SELECT statement get displayed in a proper sequence.
You have understood the syntax, and now it’s time to look at some examples that will illustrate the uses of the clause. For easy reference, let us examine the following table named Clothes with two columns, shirt_color and trousers_color.
In this case, the Postgres SELECT DISTINCT statement has been used to retrieve non-duplicate values from the column shirt_color found in the Clothes table. The ORDER BY statement has been used to sort the output of the query in an ascending alphabetical order.
Here, the PSQL SELECT DISTINCT statement is used for more than one column. This clause will return a combination of values for shirt_color and trousers_color columns. The duplicate rows will be removed, and your output will look like this:
This DISTINCT ON clause only retains the first row for each group of duplicates in your dataset. Here, your result will be sorted by trousers_color first and then by shirt_color. Your output will be a one-row entry for each of the colors.
To understand the usage of the PostgreSQL DISTINCT clause across multiple tables in the dataset, let us look at two tables: employees and branches.
To create the table for employees with two columns, the syntax will be:
Inserting values into the employees table:
To create the table for employees with two columns, the syntax will be:
Inserting values into the branches table:
To get a list of unique values, join the employees and branches table:
UNION automatically removes duplicates, while UNION ALL keeps them.
Example:
Useful when you need to keep one row based on a condition.
Example:
This query returns the latest order for each customer.
Each method has its use cases:
PostgreSQL is a versatile platform favored by several organizations across various industries. The PSQL SELECT DISTINCT clause efficiently fetches unique row values, whether applied to single columns, multiple columns, or across several tables. Using it with the ORDER BY statement yields sorted results, making your complex query easier to digest.
If you have vast datasets to organize, you can use pre-built connectors of Airbyte to move the data into centralized storage quickly. Consolidate your data from multiple sources with Postgres by signing up today!
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.