SQLite cume_dist() Function

The SQLite cume_dist() function returns the cumulative distribution of the current row, that is, the ratio of the number of rows from the first row to the last row with the same value as the current row in the total number of rows in the partition.

The SQLite cume_dist() function is often used to display the highest or lowest percentage of records in a recordset. For example, the best 5% of students in this exam, etc.

cume_dist() Syntax

Here is the syntax of the SQLite cume_dist() function:

cume_dist()
OVER (
  [PARTITION BY partition_column_list]
  [ORDER BY order_column_list]
)

Parameters

partition_column_list

List of columns participating in the partition.

partition_column_list

List of columns involved in sorting.

Return value

The SQLite cume_dist() function returns a value greater than 0 and less than or equal to 1, which is the cumulative distribution of the current row. Its calculation formula is:

(the number of rows from the first row to the last row with the same value) / the total number of rows in the partition

cume_dist() Examples

Preparing Data

Use the following CREATE TABLE statement to create a table named student_grade to store grades of students:

CREATE TABLE student_grade (
  id INTEGER PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  class CHAR(1) NOT NULL,
  subject VARCHAR(20) NOT NULL,
  grade INT NOT NULL
);

This student_grade table has 5 columns as following:

  • id - The row ID, primary key.
  • name - The name of a student.
  • class - The class a student is in.
  • subject - The name of a subject.
  • grade - The grades for a subject and a student.

Insert some rows into the student_grade table using the following INSERT statement:

INSERT INTO student_grade
  (name, class, subject, grade)
VALUES
  ('Tim', 'A', 'Math', 9),
  ('Tom', 'A', 'Math', 7),
  ('Jim', 'A', 'Math', 8),
  ('Tim', 'A', 'English', 7),
  ('Tom', 'A', 'English', 8),
  ('Jim', 'A', 'English', 7),
  ('Lucy', 'B', 'Math', 8),
  ('Jody', 'B', 'Math', 6),
  ('Susy', 'B', 'Math', 9),
  ('Lucy', 'B', 'English', 6),
  ('Jody', 'B', 'English', 7),
  ('Susy', 'B', 'English', 8);

Use the following SELECT statement to show all rows in this table:

SELECT * FROM student_grade;
id  name  class  subject  grade
--  ----  -----  -------  -----
1   Tim   A      Math     9
2   Tom   A      Math     7
3   Jim   A      Math     8
4   Tim   A      English  7
5   Tom   A      English  8
6   Jim   A      English  7
7   Lucy  B      Math     8
8   Jody  B      Math     6
9   Susy  B      Math     9
10  Lucy  B      English  6
11  Jody  B      English  7
12  Susy  B      English  8

Example 1

The following statement uses the SQLite cume_dist() function to divide all rows into 2 partitions and calculate the cumulative distribution of each student’s grades:

SELECT *,
  cume_dist() OVER (
    PARTITION BY subject
    ORDER BY grade
  ) cume_dist
FROM student_grade;
id  name  class  subject  grade  cume_dist
--  ----  -----  -------  -----  -----------------
10  Lucy  B      English  6      0.166666666666667
4   Tim   A      English  7      0.666666666666667
6   Jim   A      English  7      0.666666666666667
11  Jody  B      English  7      0.666666666666667
5   Tom   A      English  8      1.0
12  Susy  B      English  8      1.0
8   Jody  B      Math     6      0.166666666666667
2   Tom   A      Math     7      0.333333333333333
3   Jim   A      Math     8      0.666666666666667
7   Lucy  B      Math     8      0.666666666666667
1   Tim   A      Math     9      1.0
9   Susy  B      Math     9      1.0

Note that the window function in the SQL statement above:

cume_dist() OVER (
  PARTITION BY subject
  ORDER BY grade
)

In the OVER clause,

  • The PARTITION BY subject partitions all rows by subject
  • The ORDER BY grade sorts all rows within each partition in ascending order by grade
  • The cume_dist() calculate the cumulative distribution for each row.

Note that within each subject, rows with the same grade have the same cumulative distribution.