MySQL DISTINCT

This article describes how to remove duplicate rows in the result using MySQL DISTINCT keyword.

Sometimes, there are some duplicate rows in the result set returned by the SELECT statement. For example, there are many repeated last names in the actor table. If you want a unique result set that has no duplicate rows, you need to use DISTINCT keywords.

DISTINCT syntax

In the SELECT statement, you can specify on or more columns which you are expected unique after DISTINCT keyword . The syntax of the DISTINCT clause is as follows:

SELECT DISTINCT
    columns_list
FROM
    table_name

Here:

  • The DISTINCT keyword is between the SELECT keyword and columns list.
  • The columns_list specifies one or multiple columns which will be unique in the result set.
  • The columns_list can also be * .

DISTINCT examples

In the following examples, we use actor table from Sakila sample database as a demonstration.

DISTINCT one column

There are many duplicate last names in the actor table. Let’s take a look at them:

SELECT last_name FROM actor;
+--------------+
| last_name    |
+--------------+
| AKROYD       |
| AKROYD       |
| AKROYD       |
| ALLEN        |
| ALLEN        |
...
| ZELLWEGER    |
| ZELLWEGER    |
+--------------+
200 rows in set (0.01 sec)

Now, we use the DISTINCT keywords delete duplicate data:

SELECT DISTINCT last_name FROM actor;
+--------------+
| last_name    |
+--------------+
| AKROYD       |
| ALLEN        |
| ASTAIRE      |
| BACALL       |
| BAILEY       |
...
| ZELLWEGER    |
+--------------+
121 rows in set (0.00 sec)

We can see that there is no duplicate last names in the output result set. The number of rows in the result set has also changed from 200 rows to 121 rows.

DISTINCT multiple columns

You can also clear duplicate values ​​in multiple columns using DISTINCT. When multiple column values ​​are specified after DISTINCT keywords, a combination value of multiple columns is used to determine the uniqueness of the row.

In reality world, there are not only many duplicate last names, but also many duplicate first names. Let us retrieve first names and last names from the actor table:

SELECT last_name, first_name FROM actor;
+--------------+-------------+
| last_name    | first_name  |
+--------------+-------------+
| GUINESS      | PENELOPE    |
| WAHLBERG     | NICK        |
| CHASE        | ED          |
| DAVIS        | JENNIFER    |
| LOLLOBRIGIDA | JOHNNY      |
| NICHOLSON    | BETTE       |
...
| TEMPLE       | THORA       |
+--------------+-------------+
200 rows in set (0.01 sec)

There are 200 rows in the actor table.

Now, let us use the DISTINCT keywords remove duplicate rows:

SELECT DISTINCT last_name, first_name FROM actor;
+--------------+-------------+
| last_name    | first_name  |
+--------------+-------------+
| GUINESS      | PENELOPE    |
| WAHLBERG     | NICK        |
| CHASE        | ED          |
| DAVIS        | JENNIFER    |
| LOLLOBRIGIDA | JOHNNY      |
...
| TEMPLE       | THORA       |
+--------------+-------------+
199 rows in set (0.01 sec)

Now, there are 199 rows in the actor table.

We can see that the number of rows in the output result set has also changed from 200 rows to 199 rows. This shows that there is a duplicate row in the actor table. We can also prove this using the following SELECT statement with group.

SELECT last_name, first_name, COUNT(*)
FROM actor
GROUP BY last_name , first_name
HAVING COUNT(*) > 1;
+-----------+------------+----------+
| last_name | first_name | COUNT(*) |
+-----------+------------+----------+
| DAVIS     | SUSAN      |        2 |
+-----------+------------+----------+
1 row in set (0.00 sec)

DISTINCT and NULL

When DISTINCT meets NULL value, only one NULL value will be kept. Because DISTINCT thinks that all NULL values are the same, regardless of the data type of the column.

For example, the following SQL return multiple rows using the NULL record:

SELECT *
FROM (
    SELECT NULL
    UNION ALL
    SELECT NULL
    UNION ALL
    SELECT NULL
  ) t;
+------------+
| NULL       |
+------------+
| NULL       |
| NULL       |
| NULL       |
+------------+
3 rows in set (0.00 sec)

Let us use DISTINCT:

SELECT DISTINCT *
FROM (
    SELECT NULL
    UNION ALL
    SELECT NULL
    UNION ALL
    SELECT NULL
  ) t;
+------------+
| NULL       |
+------------+
| NULL       |
+------------+
1 row in set (0.00 sec)

In this example, we use UNION keyword to create a temprorary table that includes many NULL values.

Conclusion

In this article, you leared how to remove duplicate rows in the result set using MySQL DISTINCT clause. The main points of the DISTINCT clause are as follows:

  • DISTINCT can be used to remove duplicate rows in the result set.
  • You can specify one or multiple columns after DISTINCT keywords. You can alse use *;
  • DISTINCT thinks that all NULL values are the same.