PostgreSQL GROUP BY - Grouping and Aggregate

This article describes how to use the GROUP BY clause in a PostgreSQL SELECT statement to group rows according to a specified column or expression.

In PostgreSQL, the GROUP BY clause is used to group rows according to a specified column or expression.

Sometimes, we need to aggregate the result set according to a certain dimension. This is often used in statistics, consider the following scenarios:

  • Average grades by class.
  • Aggregate total scores by student.
  • Sales by year or month.
  • Count the number of users by country or region.

PostgreSQL GROUP BY syntax

The GROUP BY clause is an optional clause of a SELECT statement. To group rows in a SELECT statement, use the GROUP BY clause with the following syntax:

SELECT column1[, column2, ...], aggregate_function(ci)
FROM table
[WHERE clause]
GROUP BY column1[, column2, ...];
[HAVING clause]

Explanation:

  • The column1[, column2, ...] is the colums to group by, at least one column, but multiple columns.

  • The aggregate_function(ci) is an aggregate function, used to aggregate. This is optional. You can use one or more of the following aggregate functions:

    • sum(): Calculate the sum of the specified values within the group
    • avg(): Calculate the mean of the specified values within the group
    • max(): Calculate the maximum value of the specified values in the group
    • MIN(): Calculate the minimum value of the specified values in the group
    • count(): Count the number of the specified values in a group
  • The columns of SELECT must be the columns in the grouping clause.

  • The WHERE clause is optional and is used to filter rows before grouping.

  • The HAVING clause is optional and is used to filter groups.

PostgreSQL GROUP BY Examples

We will use the tables in the Sakila sample database for demonstration, please install the Sakila sample database in PostgreSQL first.

Simple Example for GROUP BY

The following statement uses the GROUP BY clause to find a list of last names from the actor table.

SELECT last_name
FROM actor
GROUP BY last_name;
  last_name
--------------
 AKROYD
 BRIDGES
 HUNT
 GIBSON
 ALLEN
 SUVARI
 HESTON
 MONROE
 ...
 WILSON
(121 rows)

In this example, the GROUP BY clause groups all rows by the last_name column.

The output of this example is exactly the same as the following statement with DISTINCT:

SELECT DISTINCT last_name FROM actor;

Examples with GROUP BY and Aggregate Functions

The following statement uses the GROUP BY clause and aggregate function count() to find the list of last names and the number of each last name from the actor table.

SELECT last_name, count(*)
FROM actor
GROUP BY last_name
ORDER BY count(*) DESC;
  last_name   | count
--------------+-------
 KILMER       |     5
 TEMPLE       |     4
 NOLTE        |     4
 WILLIAMS     |     3
 PECK         |     3
 HOPKINS      |     3
 DAVIS        |     3
 HARRIS       |     3
 DEGENERES    |     3
 ...
 CLOSE        |     1
(121 rows)

In this example, the steps of execution is as follows:

  1. First, the GROUP BY clause groups all rows in the actor table by last name.
  2. Then, use the aggregate function count(*) to aggregate the number of rows for each last name.
  3. Finally, use the ORDER BY clause to sort in descending order by count(*), and the surnames with the most numbers come first.

Likewise, if you want to find the number of films for each film rating from the film table, use the following statement:

SELECT rating, count(*)
FROM film
GROUP BY rating
ORDER BY count(*) DESC;
 rating | count
--------+-------
 PG-13  |   223
 NC-17  |   210
 R      |   195
 PG     |   194
 G      |   178
(5 rows)

Examples with GROUP BY, LIMIT, Aggregate Functions

The following example uses the GROUP BY clause, LIMIT clause, and the sum() function to find the top 10 customers from the payment table.

SELECT customer_id, sum(amount) total
FROM payment
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
 customer_id | total
-------------+--------
         526 | 221.55
         148 | 216.54
         144 | 195.58
         178 | 194.61
         137 | 194.61
         459 | 186.62
         469 | 177.60
         468 | 175.61
         236 | 175.58
         181 | 174.66
(10 rows)

In this example, the steps of execution is as follows:

  1. First, the GROUP BY clause groups all rows by the customer_id column.
  2. Then, use the aggregate function sum(amount) to sum the amount columns of all rows in each group and use total as the column alias .
  3. Then, use the ORDER BY clause to sort groups in descending order by total.
  4. Finally, use the LIMIT 10 clause to return the first 10 record rows.

Examples with GROUP BY and HAVING

The following example uses GROUP BY clause, HAVING clause, and the sum() function to find customers with a total spend of more than $180 from the payment table.

SELECT customer_id, sum(amount) total
FROM payment
GROUP BY customer_id
HAVING sum(amount) > 180
ORDER BY total DESC;
 customer_id | total
-------------+--------
         526 | 221.55
         148 | 216.54
         144 | 195.58
         178 | 194.61
         137 | 194.61
         459 | 186.62
(6 rows)

In this example, the steps of execution is as follows:

  1. First, the GROUP BY clause group all rows by the customer_id column.
  2. Then, use the aggregate function sum(amount) to sum the amount columns of all rows in each group and use total as the column alias.
  3. Then, use the HAVING clause to filter the groups that meets sum(amount) is greater than 180.
  4. Finally, use the ORDER BY clause to sort groups in descending order.

The HAVING clause is used to filter the grouped data and it requires a logical expression as a condition, where the column name or expression in the logical expression can only be the column, expression, or aggregate function applied in the group clause.

Conclusion

In this article, we introduced the use of the GROUP BY clause group rows by specified columns or expressions.