SQL Server NTILE() Function

In SQL Server, the NTILE() function can divide a result set into a specified number of buckets and assign each row to one of the buckets. This function is commonly used to calculate percentiles or to group data into a fixed number of groups.

Syntax

The syntax for the NTILE() function is as follows:

NTILE (integer_expression) OVER (
   [PARTITION BY partition_expression, ... ]
   ORDER BY sort_expression [ASC|DESC], ...
)

Here, integer_expression is the number of buckets to divide the data into, PARTITION BY and ORDER BY are optional and can be set as needed.

Usage

The NTILE() function is commonly used in the following scenarios:

  • Grouping: dividing data into a specified number of groups.
  • Percentile calculation: for example, dividing data into 4 buckets can calculate the 25th, 50th, and 75th percentiles.

Examples

Suppose we have the following Employees table:

EmployeeID FirstName LastName Salary
1 John Doe 50000
2 Jane Smith 60000
3 Bob Johnson 70000
4 Mike Lee 80000
5 Sarah Brown 90000
6 Tom Davis 100000
7 Karen Wilson 110000
8 Amy Anderson 120000
9 James Taylor 130000
10 Lisa Jackson 140000

The following example divides the data in the Employees table into 3 groups:

SELECT EmployeeID, FirstName, LastName, Salary,
       NTILE(3) OVER (ORDER BY Salary) AS GroupNumber
FROM Employees;

After executing the SQL statement above, we will get the following result:

EmployeeID FirstName LastName Salary GroupNumber
1 John Doe 50000 1
2 Jane Smith 60000 1
3 Bob Johnson 70000 1
4 Mike Lee 80000 2
5 Sarah Brown 90000 2
6 Tom Davis 100000 2
7 Karen Wilson 110000 3
8 Amy Anderson 120000 3
9 James Taylor 130000 3
10 Lisa Jackson 140000 3

We can see that the NTILE() function divides the data into 3 buckets and assigns each row to one of the buckets. The first bucket contains the lowest 3 salaries, the second bucket contains the middle 3 salaries, and the third bucket contains the highest 4 salaries.

Conclusion

The NTILE() function can divide a dataset into a specified number of groups, with each group containing roughly equal numbers of rows. It is frequently used in scenarios such as data analysis and report creation, particularly when data needs to be grouped and presented or grouped and calculated. By utilizing the NTILE() function, we can more flexibly control the number of groups and the number of data points within each group, thus better meeting our needs.