MySQL GROUP BY

This article will describes MySQL GROUP BY clause which can group some rows by specified columns or expression.

Sometimes, we need to summarize some rows base on one or more columns. This is often used in statistics, consider the following use cases:

  • Get the average grade by class.
  • Summarize total score by student.
  • Summarize sales by year or month.
  • Summarize the number of users by country or region.

You can use the GROUP BY clauses in these use cases.

In MySQL, GROUP BY clauses are used to group some rows into summary rows by columns values or expressions.

GROUP BY syntax

The GROUP BY Clause is used in SELECT statement. The syntax of the GROUP BY clause is as follows:

SELECT column1[, column2, ...], aggregate_function(ci)
FROM table
[WHERE clause]
GROUP BY column1[, column2, ...];
[HAVING clause]

Here:

  • column1[, column2, ...] after GROUP BY keyword is the column by which the grouping is based. It can be one or more columns.
  • aggregate_function(ci) is an aggregate function, used to summarize. This is optional, but generally available. You can alse use more aggregate functions here.
  • The columns after SELECT keyword must be one or more columns in the grouping columns.
  • The WHERE clause is used to filter the rows in the result set and it is optional.
  • The HAVING clause is used to filter grouped rows and it is optional.

Here is some aggregate functions often used in the GROUP BY clause:

  • SUM()
  • AVG()
  • MAX()
  • MIN()
  • COUNT()

GROUP BY examples

In the following example, we will use actor and paypment tables from Sakila sample database.

Simple GROUP BY example

We use the GROUP BY clause to list all last names in the actor table.

SELECT last_name
FROM actor
GROUP BY last_name;
+--------------+
| last_name    |
+--------------+
| AKROYD       |
| ALLEN        |
| ASTAIRE      |
| BACALL       |
| BAILEY       |
...
| ZELLWEGER    |
+--------------+
121 rows in set (0.00 sec)

In this example, the GROUP BY clause grouped all rows based on last_name column values.

The output of this example is as same as following statement using DISTINCT:

SELECT DISTINCT last_name FROM actor;

using aggregate functions

If you want to know the count of every last name in the above example, you can use aggregate functions COUNT(). Here is the statement:

SELECT last_name, COUNT(*)
FROM actor
GROUP BY last_name
ORDER BY COUNT(*) DESC;
+--------------+----------+
| last_name    | COUNT(*) |
+--------------+----------+
| KILMER       |        5 |
| NOLTE        |        4 |
| TEMPLE       |        4 |
| AKROYD       |        3 |
| ALLEN        |        3 |
| BERRY        |        3 |
...
| WRAY         |        1 |
+--------------+----------+
121 rows in set (0.00 sec)

In this example, here is the order of execution:

  1. First, use GROUP BY clause to group all rows by last_name column values.
  2. Second, use the aggregate function COUNT(*) to count rows in each group.
  3. Finally, use ORDER BY clause to sort COUNT(*) column in descending order.

In this way, the last name KILMER with the largest number is ranked first.

GROUP BY, LIMIT, aggregate function

In this example, let us find the top 10 customers from the payment table. We will use GROUP BY clause, LIMIT clause and aggregate functions SUM().

SELECT customer_id, SUM(amount) total
FROM payment
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
+-------------+--------+
| customer_id | total  |
+-------------+--------+
|         526 | 221.55 |
|         148 | 216.54 |
|         144 | 195.58 |
|         137 | 194.61 |
|         178 | 194.61 |
|         459 | 186.62 |
|         469 | 177.60 |
|         468 | 175.61 |
|         236 | 175.58 |
|         181 | 174.66 |
+-------------+--------+
10 rows in set (0.02 sec)

In this example, here is the order of execution:

  1. First, use GROUP BY clause to group all rows by customer_id column values.
  2. Second, use the aggregate function SUM(amount) to sum amount columns in each group, and give it a alias total;
  3. Third, use ORDER BY clause to sort total column in descending order.
  4. Finally, use the LIMIT 10 clause to return the top 10 rows.

Examples of GROUP BY and HAVING

You can use HAVING clause after GROUP BY clause to filter grouped rows. This statement returns customers whose total is more than 180.

SELECT customer_id, SUM(amount) total
FROM payment
GROUP BY customer_id
HAVING total > 180
ORDER BY total DESC;
+-------------+--------+
| customer_id | total  |
+-------------+--------+
|         526 | 221.55 |
|         148 | 216.54 |
|         144 | 195.58 |
|         137 | 194.61 |
|         178 | 194.61 |
|         459 | 186.62 |
+-------------+--------+
6 rows in set (0.02 sec)

In this example, here is the order of execution:

  1. First, use GROUP BY clause to group all rows by customer_id column values.
  2. Second, use aggregate function SUM(amount) to sum amount columns in each group, and give it a alias total;
  3. Third, use HAVING clause to filtering rows which total column value is more than 180.
  4. Finally, use ORDER BY clause to sort total column in descending order.

Conclusion

In this article, you learned MySQL GROUP BY syntax and use cases. The following are key points of the GROUP BY clause:

  • The GROUP BY clause is used to group some rows by specified columns or expressions.
  • The HAVING clause is used to filter grouped rows.
  • The GROUP BY clause is often used for summarization with aggregate functions.