PostgreSQL DISTINCT: Optimize Queries, Eliminate Redundancy

Join our newsletter to get all the insights on the data stack

In today’s tech-driven world, data integrity and security are paramount. Selecting a strong relational database is not enough; you must also know how to interact with it to analyze your data correctly. Structured Query Language (SQL) allows you to run queries on your data through simple English-like statements and clauses. The DISTINCT clause is one of the important keywords that simplifies your search for one-of-a-kind value.

This article will educate you about the DISTINCT clause and how it is used in PostgreSQL, one of the well-known databases available today.

What is PostgreSQL?

PostgreSQL, commonly referred to as Postgres, is an enterprise-grade object-relational database management system (ORDBMS). This open-source relational database was developed in 1986 at the University of California, Berkeley. It was a successor to the INGRES project and received integrated SQL support in 1994. Since then, Postgres has been recognized for its reliability, stability, and compliance measures.

PostgreSQL has a robust architecture that can accommodate both SQL (relational) and JSON (non-relational) queries. You can make use of a spectrum of SQL functions like triggers, sub-queries, foreign keys, and user-defined functions. All these functions and features help your business enormously when developing analytical applications for the web and mobile.

Since it is an open-source software, PostgreSQL remains free from corporate control and supports various operating systems. Some noteworthy ones include Windows, macOS, Linux, and UNIX. This integration eliminates the need for multiple licenses, thus reducing business costs. You can also gain access to PostgreSQL’s source code under the PostgreSQL License.

Key Features of PostgreSQL

PostgreSQL holds a coveted position among the top databases globally. Take a look at some of the key features that set it apart:

Point in Time Recovery (PITR): Through a comprehensive write ahead log (WAL), Postgres ensures continuous logging of changes made to the database. Since specific time points are mentioned, recovering and restoring data whenever needed becomes easier.
Supports Stored Procedures: Stored procedures can be understood as database routines that contain one or more SQL statements. Postgres supports multiple procedural languages that enable the creation and execution of stored procedures for a set of tasks on the database server.
Foreign Data Wrapper Library: A foreign data wrapper library can communicate and obtain data from an external source. PostgreSQL allows you to access foreign data through this library. If none of the existing foreign data wrappers suit your needs, you can even create a custom one.
Dynamic Loading: Postgres utilizes the dynamic loading mechanism to copy an object file into the server and make all the functions within the file available to the user. This feature helps you to implement new functions or types in Postgres even when the database is running.
Customization: PostgreSQL allows you to alter and modify code by using plugin developments. You can also integrate custom functions from other programming languages like Java, C++, and more.
Compliance with Industry Standards: PostgreSQL adheres to the ACID (Atomicity, Consistency, Isolation, Durability) compliance, which defines a set of expectations for a database to be reliable and efficient. Along with this, Postgres is compliant with SQL and other cross-platform functionality standards.
Vast Community: PostgreSQL's development is driven by a dedicated community of professionals and contributors. Community support helps in making the system’s security robust through regular bug identification and reporting, verification of code, and security fixes. You also receive extensive third-party support services with PostgreSQL.

💡Suggested Read: PostgreSQL Use Cases

What is the Postgres DISTINCT Clause?

The DISTINCT clause in PostgreSQL selectively retrieves unique records and discards duplicates from the database. When you apply this clause, Postgres will retain a single row from a collection of identical rows. Thus, you will be presented with distinctively unique values.

The PostgreSQL DISTINCT clause operates on both single and multiple columns in a single table. You can also use the clause on multiple tables connected to one another through JOIN statements. The PGSQL DISTINCT leverages the values from the specified columns to identify and exclude duplicates, effectively deduplicating the output. The clause is more effective when you use it for multiple columns, as it will consider a combination of values in several columns to filter out repetitive records.

There are two caveats that you must know about PostgreSQL DISTINCT. Firstly, the clause is solely compatible with the SELECT statement. And secondly, it does not exclude null values from your dataset. If your columns contain null values, they will also be counted when you use DISTINCT.

Syntax

In Postgres, the DISTINCT clause follows a particular syntax:

SELECT DISTINCT | DISTINCT ON (distinct_expressions)
column
FROM table
[WHERE conditions];

Let’s understand the parameters in the syntax and what they mean:

distinct_expressions: This represents the expressions that are used to remove duplicate values.
column: It contains the name of the column that needs to be evaluated to obtain unique records. The syntax may have more than one column name, too.
table: This is the name of the table from which the record has to be retrieved. The FROM operator requires the name of at least one table to carry out the query.
WHERE conditions: It is an optional part of the query outlining the criteria that must be fulfilled for the records to be chosen.

Syntaxes where the PostgreSQL DISTINCT Clause can be used

There are a few different ways to use the Postgres DISTINCT clause. Take a look at three of the most used syntaxes:

Syntax 1: Single Expression

SELECT
DISTINCT column_1
FROM
table_name;

In this syntax, the PostgreSQL DISTINCT clause is used for a single expression, which is a column in this case. Here, the clause will evaluate the duplicate rows in the specified column of your dataset’s table. The output will be all the unique values in the form of a list.

Syntax 2: Multiple Expressions

SELECT
DISTINCT column_1, column_2
FROM
table_name;

In this syntax, the PostgreSQL DISTINCT clause is used for multiple expressions, column_1 and column_2 here. You can keep more than two columns in the query for the clause to work. It will evaluate both the columns and scour for unique values within them. The end result will be a combination of the values, again presented in the form of a list.

Syntax 3: DISTINCT ON Clause in PSQL

SELECT
DISTINCT ON (column_1)column_alias,
column_2
FROM
table_name
ORDER BY
column_1,
column_2;

The DISTINCT ON clause is a slight variation of the DISTINCT clause in PostgreSQL. It is also used exclusively with the SELECT statement and serves the same purpose of providing unique values from your dataset.

The DISTINCT ON solely retrieves the initial row from every set of duplicates in the dataset. However, the output may be displayed in a random order.

To organize your output for better understanding and readability, you can use the ORDER BY statement in conjunction with the DISTINCT ON clause. With ORDER BY CLAUSE, the output from the first rows of the duplicate group given by DISTINCT ON and the result from other fields within the SELECT statement get displayed in a proper sequence.

Postgres DISTINCT Clause Examples

You have understood the syntax, and now it’s time to look at some examples that will illustrate the uses of the clause. For easy reference, let us examine the following table named Clothes with two columns, shirt_color and trousers_color.

Example 1: Using Postgres DISTINCT for a Single Column

SELECT
DISTINCT shirt_color
FROM
Clothes
ORDER BY
shirt_color;

In this case, the Postgres SELECT DISTINCT statement has been used to retrieve non-duplicate values from the column shirt_color found in the Clothes table. The ORDER BY statement has been used to sort the output of the query in an ascending alphabetical order.

Example 2: Using Postgres DISTINCT for Multiple Columns

SELECT
DISTINCT shirt_color,
trousers_color
FROM
Clothes
ORDER BY
shirt_color,
trousers_color;

Here, the PSQL SELECT DISTINCT statement is used for more than one column. This clause will return a combination of values for shirt_color and trousers_color columns. The duplicate rows will be removed, and your output will look like this:

Example 3: Using Postgres DISTINCT ON Clause

SELECT
DISTINCT ON
(trousers_color) pants_color,
shirt_color
FROM
Clothes
ORDER BY
trousers_color
shirt_color;

This DISTINCT ON clause only retains the first row for each group of duplicates in your dataset. Here, your result will be sorted by trousers_color first and then by shirt_color. Your output will be a one-row entry for each of the colors.

Example 4: Using Postgres DISTINCT for Multiple Tables

To understand the usage of the PostgreSQL DISTINCT clause across multiple tables in the dataset, let us look at two tables: employees and branches.

To create the table for employees with two columns, the syntax will be:

CREATE TABLE employees (
emp_name text not null,
emp_id text not null,
CONSTRAINT employees_pk PRIMARY KEY (name)
);

Inserting values into the employees table:

INSERT INTO employees (emp_name, emp_id) VALUES
(‘Paul’,‘001’),
(‘Ralph’,‘002’),
(‘Sean’,‘003’);

To create the table for employees with two columns, the syntax will be:

CREATE TABLE branches(
emp_id text not null,
br_name text not null,
);

Inserting values into the branches table:

INSERT INTO employees (emp_id, br_name) VALUES
(‘001’,’India’),
(‘002’,’Australia’),
(‘003’,’Japan’);

To get a list of unique values, join the employees and branches table:

SELECT DISTINCT ON (e.emp_id)e.emp_id, e.emp_name,
b.br_name
FROM employees e
JOIN branches b ON b.emp_id = e.emp_id
ORDER BY e.emp_id DESC;

Alternatives to DISTINCT in PostgreSQL

1. UNION vs. DISTINCT

UNION automatically removes duplicates, while UNION ALL keeps them.

Example:


-- Using DISTINCT
SELECT column1 FROM table1
UNION
SELECT column1 FROM table2

-- Equivalent to:
SELECT DISTINCT column1 FROM
(SELECT column1 FROM table1
UNION ALL
SELECT column1 FROM table2) subquery

2. Using EXISTS for deduplication

Useful when you need to keep one row based on a condition.

Example:


SELECT *
FROM orders o1
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id
    AND o2.order_date > o1.order_date
)

This query returns the latest order for each customer.

Each method has its use cases:

UNION is simpler but less flexible
EXISTS offers more control over which rows to keep

Final Takeaways

PostgreSQL is a versatile platform favored by several organizations across various industries. The PSQL SELECT DISTINCT clause efficiently fetches unique row values, whether applied to single columns, multiple columns, or across several tables. Using it with the ORDER BY statement yields sorted results, making your complex query easier to digest.

‍If you have vast datasets to organize, you can use pre-built connectors of Airbyte to move the data into centralized storage quickly. Consolidate your data from multiple sources with Postgres by signing up today!

About the Author

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

How to use PostgreSQL DISTINCT with Examples

Join our newsletter to get all the insights on the data stack

Should you build or buy your data pipelines?

About the Author

About the Author

Join our newsletter to get all the insights on the data stack

What is PostgreSQL?

Key Features of PostgreSQL

What is the Postgres DISTINCT Clause?

Syntax

Syntaxes where the PostgreSQL DISTINCT Clause can be used

Syntax 1: Single Expression

Syntax 2: Multiple Expressions

Syntax 3: DISTINCT ON Clause in PSQL

Postgres DISTINCT Clause Examples

Example 1: Using Postgres DISTINCT for a Single Column

Example 2: Using Postgres DISTINCT for Multiple Columns

Example 3: Using Postgres DISTINCT ON Clause

Example 4: Using Postgres DISTINCT for Multiple Tables

Alternatives to DISTINCT in PostgreSQL

1. UNION vs. DISTINCT

2. Using EXISTS for deduplication

Final Takeaways

About the Author

About the Author

Should you build or buy your data pipelines?

Similar use cases