PostgreSQL cume_dist() Function

The PostgreSQL cume_dist() function returns the cumulative distribution of the current row, that is, the ratio of the number of rows from the first row to the last row with the same value as the current row in the total number of rows in the partition.

The PostgreSQL cume_dist() function is often used to display the highest or lowest percentage of records in a recordset. For example, the best 5% of students in this exam, etc.

cume_dist() Syntax

Here is the syntax of the PostgreSQL cume_dist() function:

cume_dist()
OVER (
  [PARTITION BY partition_column_list]
  [ORDER BY order_column_list]
)

Parameters

partition_column_list

List of columns for partitioning.

partition_column_list

List of columns for sorting.

Return value

The PostgreSQL cume_dist() function returns a value greater than 0 and less than or equal to 1, which is the cumulative distribution of the current row. Its calculation formula is:

(the number of rows from the first row to the last row with the same value) / the total number of rows in the partition

cume_dist() Examples

Preparing Data

Use the following CREATE TABLE statement to create a table named student_grade to store grades of students:

CREATE TABLE student_grade (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  class CHAR(1) NOT NULL,
  subject VARCHAR(20) NOT NULL,
  grade INT NOT NULL
);

This student_grade table has 5 columns as following:

  • id - The row ID, primary key.
  • name - The name of a student.
  • class - The class a student is in.
  • subject - The name of a subject.
  • grade - The grades for a subject and a student.

Insert some rows into the student_grade table using the following INSERT statement:

INSERT INTO student_grade
  (name, class, subject, grade)
VALUES
  ('Tim', 'A', 'Math', 9),
  ('Tom', 'A', 'Math', 7),
  ('Jim', 'A', 'Math', 8),
  ('Tim', 'A', 'English', 7),
  ('Tom', 'A', 'English', 8),
  ('Jim', 'A', 'English', 7),
  ('Lucy', 'B', 'Math', 8),
  ('Jody', 'B', 'Math', 6),
  ('Susy', 'B', 'Math', 9),
  ('Lucy', 'B', 'English', 6),
  ('Jody', 'B', 'English', 7),
  ('Susy', 'B', 'English', 8);

Use the following SELECT statement to show all rows of this table:

SELECT * FROM student_grade;
 id | name | class | subject | grade
----+------+-------+---------+-------
  1 | Tim  | A     | Math    |     9
  2 | Tom  | A     | Math    |     7
  3 | Jim  | A     | Math    |     8
  4 | Tim  | A     | English |     7
  5 | Tom  | A     | English |     8
  6 | Jim  | A     | English |     7
  7 | Lucy | B     | Math    |     8
  8 | Jody | B     | Math    |     6
  9 | Susy | B     | Math    |     9
 10 | Lucy | B     | English |     6
 11 | Jody | B     | English |     7
 12 | Susy | B     | English |     8
(12 rows)

Example 1

The following statement uses a PostgreSQL cume_dist() function to divide all rows into 2 partitions and calculate the cumulative distribution of each student’s grades:

SELECT *,
  cume_dist() OVER (
    PARTITION BY subject
    ORDER BY grade
  )
FROM student_grade;
 id | name | class | subject | grade |      cume_dist
----+------+-------+---------+-------+---------------------
 10 | Lucy | B     | English |     6 | 0.16666666666666666
 11 | Jody | B     | English |     7 |  0.6666666666666666
  6 | Jim  | A     | English |     7 |  0.6666666666666666
  4 | Tim  | A     | English |     7 |  0.6666666666666666
  5 | Tom  | A     | English |     8 |                   1
 12 | Susy | B     | English |     8 |                   1
  8 | Jody | B     | Math    |     6 | 0.16666666666666666
  2 | Tom  | A     | Math    |     7 |  0.3333333333333333
  7 | Lucy | B     | Math    |     8 |  0.6666666666666666
  3 | Jim  | A     | Math    |     8 |  0.6666666666666666
  9 | Susy | B     | Math    |     9 |                   1
  1 | Tim  | A     | Math    |     9 |                   1
(8 rows)

Note that the window function in the SQL statement above:

cume_dist() OVER (
  PARTITION BY subject
  ORDER BY grade
)

In the OVER clause,

  • The PARTITION BY subject partitions all rows by subject
  • The ORDER BY grade sorts all rows within each partition in ascending order by grade
  • The cume_dist() calculate the cumulative distribution for each row.

Note that within each subject, rows with the same grade have the same cumulative distribution.