How to Use SQL PARTITION BY Clause (with Examples)

How to Use SQL PARTITION BY Clause (with Examples)

Overview

Aggregating data is one of the most common actions in SQL. These operations include calculating averages, minimums, and maximums. There are different ways of aggregating data in SQL, but using the PARTITION BY clause is one of the most useful.

The GROUP BY function may be simpler, but as you’ll see below, its implementation is limited when you need to compare many records without losing sight of individual ones. This makes it harder to compare individual entries against aggregates.

In this article, we’ll explain the PARTITION BY clause and how you can use it when querying databases.

What Does PARTITION BY Do in SQL?

The PARTITION BY() clause in SQL is a window function used to define how records should be separated, i.e., partitioned, before an aggregation is performed. It must be used inside the OVER() clause.

For example, in a table with the names, genders, and ages of employees, PARTITION BY can be used to separate the records based on gender before the average or maximum ages for each gender are calculated.

What Is the Difference Between PARTITION BY and GROUP BY?

Going by the above example, PARTITION BY seems no different from GROUP BY. However, GROUP BY is intended for quick aggregations and reporting. The output is, therefore, a summary of the results from aggregating groups of records. Individual rows of records are condensed into a few rows representing each group, as shown below.

SQL Partition By Example Row

The equivalent process using the PARTITION BY clause results in an aggregation that is shown alongside individual records, as shown below.

SQL Partition By Record

This gives you the ability to perform operations without losing sight of individual records. Even details such as user names and the age of each person can be displayed, making it easier to compare an individual’s data to the aggregate.

SQL Partition By Aggregate

The Over() Clause

The PARTITION BY clause can only be used inside the OVER clause. The OVER clause is what defines the window within which window functions are performed. It’s also what preserves the row-wise data and can be used without the PARTITION BY clause, as shown below in the average age column.

SQL Partitio By Over Clause

The Over() Clause

The PARTITION BY clause can only be used inside the OVER clause. The OVER clause is what defines the window within which window functions are performed. It’s also what preserves the row-wise data and can be used without the PARTITION BY clause, as shown below in the average age column.

How to Use PARTITION BY

Using PARTITION BY requires understanding the syntax and the different operations that can benefit from having a partitioned dataset.

Syntax

The syntax for using PARTITION BY is as follows:

SELECT column1, column2,…, WINDOW FUNCTION(column_name1) OVER(PARTITION BY(column_name2)

Multiple columns in the original data set can be selected alongside the partitioned column. Column_name1 represents the column that will have an aggregation or other function performed on it, while column_name2 is the column that will be used to group rows of data before the window function is performed on each separate group.

Aggregate Functions and Other Clauses

Aggregate functions are commonly paired with PARTITION BY. Some calculations performed by aggregate functions include:

  • Maximums - MAX()
  • Averages - AVG()
  • Minimums - MIN()
  • Counting - COUNT()
  • Summing - SUM()

In addition to aggregate functions, there are other functions used with PARTITION BY for other purposes. Some examples include:

  • ORDER BY - This clause is used to arrange the rows in each partition in ascending or descending order. Using this clause will affect how some window functions, e.g., SUM(), are applied.
  • RANK - This window function is used to assign ranks to records based on some quantifiable criteria. If two rows have the same rank, one number is skipped when assigning the next rank, i.e., 1,2,3,3,5,6,…
  • DENSE_RANK - Similar to the RANK function except, if two records have the same rank, the next record is assigned the next numerical value without skipping one, i.e., 1,2,3,3,4,5,…

Example 1: Employee salary

Problem

You have been given a table with columns for employee IDs, first names, last names, job roles, salaries, and department IDs. Compare the salaries of employees with the average, maximum, and minimum salaries in their respective departments.

Solution

Create the table containing employee salary information.

CREATE TABLE employee_salary (
  employee_id INT NOT NULL,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  occupation VARCHAR(50),
  salary INT,
  dept_id INT
);

Insert values into the table above.

INSERT INTO employee_salary (employee_id, first_name, last_name, occupation, salary, dept_id)
VALUES
(1,'George', 'Bishop', 'Senior Software Engineer', 145000, 1),
(2,'Tom', 'Ludley', 'Software Engineer', 105000, 1),
(3,'Rebecca', 'McDouglas', 'Software Engineer', 101000, 1),
(4, 'Jerry', 'Astley', 'Office Secretary', 50000, 2),
(5, 'Angela', 'Warren', 'Human Resource Manager', 65000, 3),
(6, 'Donald', 'Duckwing', 'Data Engineer', 70000, 4),
(7, 'Sandy', 'Perkins', 'Data Scientist', 100000, 4),
(8, 'Rick', 'Jones', 'Assistant Human Resource Manager', 45000, 3),
(9, 'Wyatt', 'Ridge', 'Office Adminstrator', 45000, 2),
(10, 'Dwight', 'Langley', 'Receptionist', 40000, 2),
(11, 'Mark', 'Austine', 'Senior Data Analyst', 100000, 4),
(12, 'Page', 'Brooks', 'Data Analyst', 60000, 4)

Use the PARTITION BY clause to return a table with first names, last names, and salaries. The table will also contain average, maximum, and minimum salaries per department.

SELECT first_name, last_name, dept_id, salary,  
AVG(salary) OVER(PARTITION BY dept_id) AS AVG_SAL,
MAX(salary) OVER(PARTITION BY dept_id) AS MAX_SAL,
MIN(salary) OVER(PARTITION BY dept_id) AS MIN_SAL 
FROM employee_salary;

This resulting table is shown below.

Example 2: Salary Ranking

Problem

In addition to the table in example 1, you have been provided with a second table containing demographic information, including age and gender. Write a query that groups the male and female employees separately and ranks them from the highest to the lowest earners.

Solution

Create the table for employee demographics and add data. Note that the employee ID in this table MUST be the same as the employee ID in the salaries table.

CREATE TABLE employee_basics (
  employee_id INT NOT NULL,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  age INT,
  gender VARCHAR(10),
  birth_date DATE,
  PRIMARY KEY (employee_id)
);

INSERT INTO employee_basics (employee_id, first_name, last_name, age, gender, birth_date)
VALUES
(1,'George', 'Bishop', 51, 'Male','1973-03-25'),
(2,'Tom', 'Ludley', 36, 'Male', '1988-01-04'),
(3,'Rebecca', 'McDouglas', 41, 'Female', '1982-06-04'),
(4, 'Jerry', 'Astley', 29, 'Male', '1994-06-30'),
(5, 'Angela', 'Warren', 33, 'Female', '1990-10-13'),
(6, 'Donald', 'Duckwing', 40, 'Male', '1983-08-30'),
(7, 'Sandy', 'Perkins', 45, 'Female', '1978-12-15'),
(8, 'Rick', 'Jones', 43, 'Male', '1981-03-22'),
(9, 'Wyatt', 'Ridge', 23, 'Male', '2001-04-26'),
(10, 'Dwight', 'Langley', 39, 'Male', '1985-03-02'),
(11, 'Mark', 'Austine', 40, 'Male', '1984-06-30'),
(12, 'Page', 'Brooks', 26, 'Female', '1998-03-27');

Write the code to join the two tables and rank male and female employees based on salaries.

SELECT sal.first_name, sal.last_name, occupation, salary, gender,  
RANK() OVER(PARTITION BY gender ORDER BY salary DESC) AS Pay_Rank
FROM employee_salary AS sal
INNER JOIN employee_basics AS dem
	ON sal.employee_id = dem.employee_id
;

The resulting table should appear as shown below.

Since the RANK clause was used and not the DENSE_RANK, the pay rank for male employees goes from 6 to 8.

FAQs

1. What is ROW_NUMBER() in PARTITION BY?

This is a window function that gives each row a number. The number for each row in a partition is unique, but numbers restart at 1 in the next partition.

Unlike the RANK() function, the row numbers are not assigned based on a quantitative measure and simply label the rows from 1 to the number corresponding to the last value in each partition.

2. Why use SQL PARTITION BY?

The PARTITION BY clause has many advantages. It organizes records into meaningful groups while producing useful aggregates. There is less data to be processed at any time since the calculations for each partition are handled separately. The flexibility of this clause makes it easier to handle complicated calculations inside SQL.

3. Is DENSE_RANK the same as ROW_NUMBER?

The results from the DENSE_RANK function often look identical to that of the ROW_NUMBER function. However, similar to the RANK function, the results from DENSE_RANK come from a comparison of data in a specific column, while those assigned by ROW_NUMBER are not the result of a comparison.

Conclusion

PARTITION BY is an incredibly useful clause in SQL when you’re working with window functions. It makes it easy to calculate different metrics while retaining the original rows of data making comparison easy. The syntax is also relatively easy to understand and use.

Interview Query offers a wide range of resources to prepare you for your next tech interview. Our question bank contains SQL questions for different experience levels, and we also offer interview guides for specific tech positions in leading companies. You can also visit our blog for other useful articles on SQL and other topics.

If you’d like to use SQL for data aggregation but find the GROUP BY clause insufficient, the PARTITION BY clause will supercharge your efforts.