SQL (Structured Query Language) is one of the most important querying languages in use today to access and transform relational databases. A common SQL problem statement is to perform a database manipulation based on conditional logic.
CASE WHEN statement provides flexibility in expressing conditional logic as it can handle multiple conditions. To perform conditional aggregation, the
SUM function combined with
CASE WHEN in SQL will achieve optimized results.
In this article, we’ll briefly touch upon the SUM() and CASE() functions, some common use cases, and how the two functions can be nested. We will also explore some real-life examples for practice and provide a few useful tips to ensure error-free, optimized code.
CASE function will help you transform data in SQL with conditional logic, much like an IF-ELSE statement.
CASE expression goes through several conditions and returns a value when the first condition is met. So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the
Let’s say you’ve been asked to categorize employees based on their yearly pay.
Here is the query to achieve the given operation.
SELECT EmployeeName, Salary, CASE WHEN salary >= 50000 AND salary < 70000 THEN 'Grade A' WHEN salary >= 70000 AND salary < 90000 THEN 'Grade B' WHEN salary >= 90000 THEN 'Grade C' ELSE 'Grade D' END AS Grade FROM employees;
Here is the desired result:
|1||John Doe||55000||Grade A|
|2||Jane Smith||60000||Grade A|
|3||Alice Johnson||120000||Grade B|
|4||Bob Williams||45000||Grade D|
|5||Eva Davis||120000||Grade C|
In this example, the
CASE statement handles multiple conditions to categorize employees into different pay grades. Each
WHEN condition checks a specific range of salaries and assigns the corresponding pay grade. The
ELSE statement handles all other conditions that are not included in the
SUM function in SQL is an aggregate function used for calculating the total sum of values in a specific column. It is most often used with the
GROUP BY clause to calculate totals within groups. It can also be used along with
DISTINCTto calculate the sum of distinct values in a column or with other arithmetic operations such as
COUNTto perform complex calculations.
NULL values. If a row contains a
NULL value in the column being summed, it is not included in the calculation. This feature can help when handling missing data.
CASE WHEN allows you to perform conditional summation in SQL. It’s particularly useful when you want to aggregate values based on specific conditions. Consider it the equivalent of “pivoting” data - combining multiple rows by category and an aggregate function.
SUM(CASE WHEN condition THEN value ELSE alternative END)
In this syntax:
condition: The criteria that must be met for the
THENclause to execute.
value: The value to sum if the condition is met.
THEN: Specifies the result if the condition evaluates to true.
ELSE alternative: If the condition is not met, this value is used. Often set to 0 so it doesn’t affect the sum.
END: Concludes the
SELECT SUM(CASE WHEN column_name > 100 THEN column_name ELSE 0 END) AS conditional_sum FROM table_name;
In this example, we’re summing all the values in
column_name where each value is greater than 100. If a value is 100 or less, it does not contribute to the sum.
Now for the above illustration, the same result can be achieved more efficiently by excluding the rows with values 100 or less with the
WHERE clause. However, combining SUM with CASE-WHEN is necessary when:
SUMallows you to differentiate the sums based on varying conditions. It enables you to avoid multiple queries or subqueries and allows for easier readability and maintenance of reporting.
WHEREclause will change the count of the rows returned. With
CASE WHEN, you can maintain the integrity of the row count, which is particularly useful in reports where you want to show both the total and the conditional sums.
GROUP BYor window functions,
CASE WHENcan produce aggregates over partitions of the data, which isn’t possible with
WHEREalone when multiple clauses need to be satisfied.
Here are some real-life scenarios where these two functions should be nested:
Let us explore some real-world examples on this topic with written code.
1. You’re working with an e-commerce company. You want to calculate the total sales amount for two categories of items: ‘Electronics’ and ‘Clothing’, as well as a total for all other categories. Use
CASE WHEN to solve the problem.
Let’s say your sales table looks like this:
This is the code to pivot the data by sales category:
SELECT SUM(CASE WHEN category = 'Electronics' THEN amount ELSE 0 END) AS ElectronicsSales, SUM(CASE WHEN category = 'Clothing' THEN amount ELSE 0 END) AS ClothingSales, SUM(CASE WHEN category NOT IN ('Electronics', 'Clothing') THEN amount ELSE 0 END) AS OtherSales FROM sales;
Here is the expected output:
You can practice a similar interview problem here: Monthly Product Sales
2. Let’s look at a slightly more complex problem: SUM with CASE WHEN and the GROUP BY clause. Suppose we have a table of transactions with each transaction belonging to different departments within a company. We want to calculate the total spend for ‘IT’, ‘HR’, and ‘Marketing’, and also have a total for ‘Other’ departments, grouped by quarters.
Let us assume that you are supplied with the following transactions table.
We will group by the quarters of the transaction dates. The quarters are:
Here is the code to achieve this transformation.
SELECT CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 1 AND 3 THEN 'Q1' WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 4 AND 6 THEN 'Q2' WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 7 AND 9 THEN 'Q3' WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 10 AND 12 THEN 'Q4' END AS Quarter, SUM(CASE WHEN department = 'IT' THEN amount ELSE 0 END) AS ITSales, SUM(CASE WHEN department = 'HR' THEN amount ELSE 0 END) AS HRSales, SUM(CASE WHEN department = 'Marketing' THEN amount ELSE 0 END) AS MarketingSales, SUM(CASE WHEN department NOT IN ('IT', 'HR', 'Marketing') THEN amount ELSE 0 END) AS OtherSales FROM transactions GROUP BY Quarter;
The above query uses the
CASE WHEN statement in two ways; one, simply to categorize the transaction dates by quarters, and the other way is by nesting the sum and case statement to aggregate the spend by department. The output from the above query would look like this:
This table shows the total sales amount for each department, grouped by the fiscal quarter. The ‘OtherSales’ column includes sales from departments not listed specifically (like ‘Finance’ in the transactions table).
3. HR Analytics Case Study: In this example, we’ll look at
CASE WHEN in addition to an arithmetic operation. Let’s consider a scenario where we have an employees table, and we want to calculate the total salary expenditure for different job titles while also considering overtime. We want to sum up the regular salaries and the overtime payments separately and also include a total sum for each job title.
The following employees table is provided to you.
While calculating the total compensation per job role, we’ll include the
CASE WHEN statement to apply conditional logic to distinguish between job types and compute remuneration accordingly. Here is the code to get the desired result:
SELECT job_title, SUM(salary) AS TotalSalaries, SUM(overtime_hours * overtime_rate) AS TotalOvertimePayments, SUM( salary + (overtime_hours * overtime_rate) + (CASE WHEN job_title = 'Software Developer' THEN 500 WHEN job_title = 'Graphic Designer' THEN 300 ELSE 200 END) ) AS TotalCompensation FROM employees GROUP BY job_title;
The output from the above query will now include the bonus in the total compensation:
CASE WHEN statement in the query adds a conditional bonus to total compensation by job type based on their job title.
4. We’ll look at one last example - the
CASE WHEN statement when used with a logical operator. An electronics retail company tracks its sales and wants to run a special performance analysis.
They aim to categorize sales into different bonus eligibility groups based on the amount of sales and the region where the sale occurred. The conditions for categorization are:
The company wants a report that sums these amounts by region to help with financial planning and bonus allocations.
Here is the sales table:
And here is the code to get the required output.
SELECT region, SUM(sale_amount) AS TotalSales, SUM( CASE WHEN sale_amount > 2000 OR region = 'East' THEN sale_amount ELSE 0 END ) AS PremiumSales, SUM( CASE WHEN sale_amount BETWEEN 1000 AND 2000 AND region != 'East' THEN sale_amount ELSE 0 END ) AS StandardSales, SUM( CASE WHEN region = 'West' OR (sale_date BETWEEN '2023-06-01' AND '2023-07-31') THEN sale_amount ELSE 0 END ) AS PromotionalSales FROM sales GROUP BY region;
The output would group sales according to the company’s specifications:
Likewise, you can combine the SUM and CASE statements with other logical operators such as BETWEEN and AND in a similar way to incorporate conditional logic in similar scenarios.
CASE WHEN construct is quite standard in SQL and is typically supported across most SQL databases with no variation in syntax.
However, in MySQL, you have the option to use the IF() function. It works similarly to programming languages’ IF-ELSE statements and can only be used in SELECT statements, WHERE clauses, and other places where you would use a MySQL expression.
Here’s how you might nest an
IF() function with
SUM() to perform conditional summation.
SELECT SUM(IF(category IN ('A', 'B'), amount, 0)) AS TotalSales FROM sales;
While the IF() function has an easier syntax to follow, it is important to remember
CASE is part of the SQL standard and is supported by virtually all SQL-compliant RDBMS, whereas
IF() is specific to MySQL. If you want to ensure that your skills are transferable, you should prefer
CASE can handle multiple conditions in a sequence making it more versatile. So, it is generally recommended to use
IF-ELSE even when querying in MySQL.
It is important to remember the following caveats to make your code error-free and efficient:
CASEexpression is significant, as the conditions are evaluated in the order they are written. Once a condition is met, the
CASEexpression will return the corresponding result and not evaluate any subsequent conditions. This downside is important to bear in mind to ensure that your query produces the correct result.
CASEexpression within an aggregate function like
SUMmust be carefully written to handle
NULLvalues appropriately, as SUM disregards rows with NULLs.
WHEREclause to limit the rows whenever possible before implementing
CASElogic. This can help the query planner to use an index to filter rows first, thus reducing the number of rows over which the
CASEneeds to be evaluated.
We hope that this article has helped you understand the different ways to combine the SUM and
CASE-WHEN functions to perform powerful transformations in SQL.
For an end-to-end overview of SQL, including complex joins, advanced reporting, and creating multi-table databases, you can explore our specialized learning path.