PostgreSQL DISTINCT Usages

This article describes how to use in PostgreSQL DISTINCT to return a result set with no duplicate rows.

In PostgreSQL, the DISTINCT keyword is used in the SELECT statement so that it returns a result set with no duplicate rows.

PostgreSQL DISTINCT syntax

To return a result set with no duplicate rows, use the SELECT statement with the DISTINCT keyword:

Here is the syntax for DISTINCT:

SELECT
   DISTINCT column1 [, column2, ...]
FROM
   table_name;

Explanation:

  • The keyword DISTINCT should be specified after SELECT.
  • Specify the columns to evaluate for duplicates after the DISTINCT keyword .
  • Multiple column names need to be separated by ,. If multiple column names are specified, MySQL will evaluate for duplicates based on the combined value of these columns.
  • You can use DISTINCT * to evaluate duplicates for all columns.

PostgreSQL also provides DISTINCT ON (expression) to keep the first row of each set of duplicates using the following syntax:

SELECT
   DISTINCT ON (column1) column_alias,
   column2
FROM
   table_name
ORDER BY
   column1,
   column2;

It is a good practice to always use the ORDER BY clause with the DISTINCT ON(expression) to make the result set predictable.

Note that the DISTINCT ON expression must match the leftmost expression in the ORDER BY clause .

PostgreSQL DISTINCT Examples

We will use the tables in the Sakila sample database for demonstration, please install the Sakila sample database in PostgreSQL first.

To retrieve all ratings of films from the film table, use the following statement:

SELECT
    DISTINCT rating
FROM
    film;
 rating
--------
 R
 PG-13
 G
 PG
 NC-17
(5 rows)

Here, in order to find all the films ratings, we use DISTINCT rating, so that each film rating appears only once in the result set.

To retrieve all rent amounts from the film table, use the following statement:

SELECT
    DISTINCT rental_rate
FROM
    film;
 rental_rate
-------------
        2.99
        4.99
        0.99
(3 rows)

Here, in order to find all the film rental amounts, we use DISTINCT rental_rate, so that each film rental amount appears only once in the result set.

To retrieve all combinations of ratings and rental amounts from the film table, use the following statement:

SELECT
    DISTINCT rating, rental_rate
FROM
    film
ORDER BY rating;
 rating | rental_rate
--------+-------------
 G      |        0.99
 G      |        4.99
 G      |        2.99
 PG     |        2.99
 PG     |        0.99
 PG     |        4.99
 PG-13  |        4.99
 PG-13  |        0.99
 PG-13  |        2.99
 R      |        0.99
 R      |        2.99
 R      |        4.99
 NC-17  |        0.99
 NC-17  |        2.99
 NC-17  |        4.99
(15 rows)

Here, we used DISTINCT rating, rental_rate to find all combinations of film ratings and rental amounts. To make the output more readable, we use ORDER BY to sort the result set in positive order by film ranking.

If you want to return the first row for each set of films ratings, use the DISTINCT ON:

SELECT
    DISTINCT ON (rating) rating,
    film_id,
    title
FROM
    film
ORDER BY rating, film_id DESC;
 rating | film_id |      title
--------+---------+------------------
 G      |       2 | ACE GOLDFINGER
 PG     |       1 | ACADEMY DINOSAUR
 PG-13  |       7 | AIRPLANE SIERRA
 R      |       8 | AIRPORT POLLOCK
 NC-17  |       3 | ADAPTATION HOLES

DISTINCT and NULL

DISTINCT treats all null values ​​are the same, regardless of the field’s type, so only one null value is left from DISTINCT.

For example the following SQL statement returns multiple rows of NULL records:

SELECT NULL nullable_col
UNION ALL
SELECT NULL nullable_col
UNION ALL
SELECT NULL nullable_col;
 nullable_col
--------------
 <null>
 <null>
 <null>
(3 rows)

Here, we have 3 rows, each of which has a nullable_col column value of NULL.

After using DISTINCT:

SELECT
    DISTINCT nullable_col
FROM
    (
    SELECT NULL nullable_col
    UNION ALL
    SELECT NULL nullable_col
    UNION ALL
    SELECT NULL nullable_col
    ) t;
 nullable_col
--------------
 <null>
(1 row)

This example uses to UNION ALL simulate a recordset containing multiple NULL values.

Conclusion

This article describes how to use the DISTINCT to clear duplicate rows in a result set. The key points of the usage of the DISTINCT are as follows:

  • The SELECT DISTINCT statement returns a result set with no duplicate rows.
  • You can specify one or more columns after DISTINCT, or you can use it *.
  • The DISTINCT treat all NULL as equal and keep only one.
  • The DISTINCT ON is used to return the first row of each group of repeated values.