MySQL CUME_DIST() Function

The MySQL CUME_DIST() function returns the cumulative distribution of the current row, that is, the ratio of the number of rows from the first row to the last row with the same value as the current row in the total number of rows in the partition.

The MySQL CUME_DIST() function is often used to display the highest or lowest percentage of records in a recordset. For example, the best 5% of students in this exam, etc.

CUME_DIST() Syntax

Here is the syntax of the MySQL CUME_DIST() function:

CUME_DIST()
OVER (
  [PARTITION BY partition_column_list]
  [ORDER BY order_column_list]
)

Parameters

partition_column_list

List of columns for partitioning.

partition_column_list

List of columns for sorting.

Return value

The MySQL CUME_DIST() function returns a value greater than 0 and less than or equal to 1, which is the cumulative distribution of the current row. Its calculation formula is:

(the number of rows from the first row to the last row with the same value) / the total number of rows in the partition

CUME_DIST() Examples

Preparing Data

Use the following CREATE TABLE statement to create a table named student_grade to store grades of students:

CREATE TABLE student_grade (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  class CHAR(1) NOT NULL,
  subject VARCHAR(20) NOT NULL,
  grade INT NOT NULL
);

This student_grade table has 5 columns as following:

  • id - The row ID, primary key.
  • name - The name of a student.
  • class - The class a student is in.
  • subject - The name of a subject.
  • grade - The grades for a subject and a student.

Insert some rows into the student_grade table using the following INSERT statement:

INSERT INTO student_grade
  (name, class, subject, grade)
VALUES
  ('Tim', 'A', 'Math', 9),
  ('Tom', 'A', 'Math', 7),
  ('Jim', 'A', 'Math', 8),
  ('Tim', 'A', 'English', 7),
  ('Tom', 'A', 'English', 8),
  ('Jim', 'A', 'English', 7),
  ('Lucy', 'B', 'Math', 8),
  ('Jody', 'B', 'Math', 6),
  ('Susy', 'B', 'Math', 9),
  ('Lucy', 'B', 'English', 6),
  ('Jody', 'B', 'English', 7),
  ('Susy', 'B', 'English', 8);

Use the following SELECT statement to show all rows in this table:

SELECT * FROM student_grade;
+----+------+-------+---------+-------+
| id | name | class | subject | grade |
+----+------+-------+---------+-------+
|  1 | Tim  | A     | Math    |     9 |
|  2 | Tom  | A     | Math    |     7 |
|  3 | Jim  | A     | Math    |     8 |
|  4 | Tim  | A     | English |     7 |
|  5 | Tom  | A     | English |     8 |
|  6 | Jim  | A     | English |     7 |
|  7 | Lucy | B     | Math    |     8 |
|  8 | Jody | B     | Math    |     6 |
|  9 | Susy | B     | Math    |     9 |
| 10 | Lucy | B     | English |     6 |
| 11 | Jody | B     | English |     7 |
| 12 | Susy | B     | English |     8 |
+----+------+-------+---------+-------+
12 rows in set (0.00 sec)

Example 1

The following statement uses the MySQL CUME_DIST() function to divide all rows into 2 partitions and calculate the cumulative distribution of each student’s grades:

SELECT *,
  CUME_DIST() OVER (
    PARTITION BY subject
    ORDER BY grade
  ) "cume_dist"
FROM student_grade;
+----+------+-------+---------+-------+---------------------+
| id | name | class | subject | grade | CUME_DIST           |
+----+------+-------+---------+-------+---------------------+
| 10 | Lucy | B     | English |     6 | 0.16666666666666666 |
|  4 | Tim  | A     | English |     7 |  0.6666666666666666 |
|  6 | Jim  | A     | English |     7 |  0.6666666666666666 |
| 11 | Jody | B     | English |     7 |  0.6666666666666666 |
|  5 | Tom  | A     | English |     8 |                   1 |
| 12 | Susy | B     | English |     8 |                   1 |
|  8 | Jody | B     | Math    |     6 | 0.16666666666666666 |
|  2 | Tom  | A     | Math    |     7 |  0.3333333333333333 |
|  3 | Jim  | A     | Math    |     8 |  0.6666666666666666 |
|  7 | Lucy | B     | Math    |     8 |  0.6666666666666666 |
|  1 | Tim  | A     | Math    |     9 |                   1 |
|  9 | Susy | B     | Math    |     9 |                   1 |
+----+------+-------+---------+-------+---------------------+
12 rows in set (0.00 sec)

Note that the window function in the SQL statement above:

CUME_DIST() OVER (
  PARTITION BY subject
  ORDER BY grade
)

In the OVER clause,

  • The PARTITION BY subject partitions all rows by subject
  • The ORDER BY grade sorts all rows within each partition in ascending order by grade
  • The CUME_DIST() calculate the cumulative distribution for each row.

Note that within each subject, rows with the same grade have the same cumulative distribution.