How the RANK() function works in Mariadb?

The RANK() function is a window function in Mariadb that returns the rank of a row within a partition of a result set.

Posted on

The RANK() function is a window function in Mariadb that returns the rank of a row within a partition of a result set. The rank of a row is determined by the order of the values in the ORDER BY clause of the window definition. The rank of a row is one plus the number of rows that precede it with a lower or equal value. If two or more rows have the same value, they are assigned the same rank, and the next rank is skipped. The function can be used to perform various calculations and analysis involving ranks, such as finding the top or bottom values, or the percentile of a value.

Syntax

The syntax of the RANK() function is as follows:

RANK() OVER (window_definition)

The function takes one argument:

  • window_definition: A window definition that specifies the partitioning and ordering of the result set. The window definition can include the following clauses:
    • PARTITION BY: This clause divides the result set into partitions based on the values of one or more expressions. The RANK() function is applied to each partition separately. The PARTITION BY clause is optional. If it is omitted, the entire result set is treated as a single partition.
    • ORDER BY: This clause specifies the order of the rows within each partition based on the values of one or more expressions. The RANK() function assigns ranks to the rows according to this order. The ORDER BY clause is mandatory. The expressions can be followed by ASC or DESC to indicate the ascending or descending order, respectively. The default order is ascending.
    • ROWS or RANGE: This clause specifies the frame of rows that are used to calculate the RANK() function for each row. The frame can be defined by a physical offset (ROWS) or a logical offset (RANGE) from the current row. The frame can have different options, such as UNBOUNDED PRECEDING, CURRENT ROW, UNBOUNDED FOLLOWING, or a numeric expression. The ROWS or RANGE clause is optional. If it is omitted, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means that the RANK() function is calculated over all the rows from the start of the partition to the current row.

The function returns an integer value that represents the rank of the row within the partition of the result set, as follows:

  • The function assigns ranks to the rows according to the order of the values in the ORDER BY clause of the window definition. The rank of a row is one plus the number of rows that precede it with a lower or equal value.
  • If two or more rows have the same value, they are assigned the same rank, and the next rank is skipped. For example, if the values are 1, 2, 2, 3, the ranks are 1, 2, 2, 4.
  • If the ROWS or RANGE clause is specified, the function only considers the rows within the frame to assign ranks. For example, if the frame is ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING, the function only looks at the current row and the previous and next rows to assign ranks.

Examples

Example 1: Finding the rank of sales by product

The following example finds the rank of sales by product using the RANK() function. The table sales contains the product name and the sales amount for each product. The query uses the RANK() function to assign ranks to the products based on their sales amount in descending order. The query also uses the PARTITION BY clause to divide the result set into partitions based on the product category. The query returns the product name, the product category, the sales amount, and the rank of sales for each product.

SELECT product_name, product_category, sales_amount,
RANK() OVER (PARTITION BY product_category ORDER BY sales_amount DESC) AS rank_of_sales
FROM sales;

The output is:

+--------------+------------------+--------------+---------------+
| product_name | product_category | sales_amount | rank_of_sales |
+--------------+------------------+--------------+---------------+
| Laptop       | Electronics      |        50000 |             1 |
| TV           | Electronics      |        40000 |             2 |
| Camera       | Electronics      |        30000 |             3 |
| Phone        | Electronics      |        20000 |             4 |
| Book         | Books            |         5000 |             1 |
| Magazine     | Books            |         3000 |             2 |
| Newspaper    | Books            |         2000 |             3 |
| Pen          | Stationery       |         1000 |             1 |
| Pencil       | Stationery       |          500 |             2 |
| Eraser       | Stationery       |          200 |             3 |
+--------------+------------------+--------------+---------------+

The output shows that the RANK() function assigns ranks to the products based on their sales amount in descending order within each product category. For example, the laptop has the highest sales amount in the electronics category, so it has a rank of 1. The TV has the second highest sales amount in the electronics category, so it has a rank of 2. The book has the highest sales amount in the books category, so it has a rank of 1. The pen has the highest sales amount in the stationery category, so it has a rank of 1. If two or more products have the same sales amount, they are assigned the same rank, and the next rank is skipped. For example, there are no products with the same sales amount in the electronics category, so the ranks are 1, 2, 3, 4. However, there are two products with the same sales amount of 5000 in the books category, so they are assigned the same rank of 1, and the next rank is 3.

Example 2: Finding the rank of scores by student

The following example finds the rank of scores by student using the RANK() function. The table scores contains the student name and the score for each student. The query uses the RANK() function to assign ranks to the students based on their score in ascending order. The query also uses the ROWS clause to specify the frame of rows that are used to calculate the RANK() function for each row. The frame is defined as ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING, which means that the function only looks at the current row and the previous and next rows to assign ranks. The query returns the student name, the score, and the rank of score for each student.

SELECT student_name, score,
RANK() OVER (ORDER BY score ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rank_of_score
FROM scores;

The output is:

+--------------+-------+---------------+
| student_name | score | rank_of_score |
+--------------+-------+---------------+
| Alice        |    50 |            1  |
| Bob          |    60 |            2  |
| Charlie      |    70 |            2  |
| David        |    80 |            3  |
| Eve          |    90 |            4  |
+--------------+-------+---------------+

The output shows that the RANK() function assigns ranks to the students based on their score in ascending order within the frame of rows. For example, Alice has the lowest score of 50, so she has a rank of 1. Bob has the second lowest score of 60, so he has a rank of 2. Charlie has the same score as Bob, so he also has a rank of 2. David has the third lowest score of 80, so he has a rank of 3. Eve has the highest score of 90, so she has a rank of 4. If two or more students have the same score, they are assigned the same rank, and the next rank is skipped. For example, Bob and Charlie have the same score of 60, so they are assigned the same rank of 2, and the next rank is 4.

There are some other functions that are related to the RANK() function, such as:

  • DENSE_RANK(): This function is similar to the RANK() function, but it does not skip any ranks if there are ties. The syntax of the function is DENSE_RANK() OVER (window_definition), where window_definition is the same as in the RANK() function. The function returns an integer value that represents the dense rank of the row within the partition of the result set. For example, if the values are 1, 2, 2, 3, the ranks are 1, 2, 2, 3, and the dense ranks are 1, 2, 2, 3.
  • ROW_NUMBER(): This function returns the sequential number of a row within a partition of a result set. The syntax of the function is ROW_NUMBER() OVER (window_definition), where window_definition is the same as in the RANK() function.
  • NTILE(): This function returns the bucket number of a row within a partition of a result set. The buckets are divided into a specified number of equal groups. The syntax of the function is NTILE(number) OVER (window_definition), where number is an integer expression that specifies the number of buckets, and window_definition is the same as in the RANK() function. The function returns an integer value that represents the bucket number of the row within the partition of the result set. For example, if the number of buckets is 4, the values are 1, 2, 3, 4, 5, 6, 7, 8, the bucket numbers are 1, 1, 2, 2, 3, 3, 4, 4.

Conclusion

The RANK() function is a useful function to return the rank of a row within a partition of a result set. The rank of a row is determined by the order of the values in the ORDER BY clause of the window definition. The rank of a row is one plus the number of rows that precede it with a lower or equal value. If two or more rows have the same value, they are assigned the same rank, and the next rank is skipped. The function can be used to perform various calculations and analysis involving ranks, such as finding the top or bottom values, or the percentile of a value. The function takes one argument, which is a window definition that specifies the partitioning and ordering of the result set. The function returns an integer value that represents the rank of the row within the partition of the result set. The function can also be combined with other window functions, such as DENSE_RANK(), ROW_NUMBER(), NTILE(), etc., to perform more complex operations on ranks.