Using SQL COUNT() with CASE WHEN: A Comprehensive Guide

Using SQL COUNT() with CASE WHEN: A Comprehensive Guide

Introduction

The SQL COUNT (CASE WHEN...) statement is a powerful tool in SQL programming that allows you to perform counts based on specified conditions. By leveraging conditional counting, you can efficiently retrieve, analyze, and aggregate data with a single query. In this article, we will explore syntax, usage, real-world examples, and advanced techniques of SQL’s COUNT (CASE WHEN...) statement and help you start using it in your queries.

Syntax Review

Before we get started with leveraging the power of COUNT (CASE WHEN...), let’s first review COUNT and CASE WHEN separately.

COUNT

In SQL, COUNT(COLUMN) is an aggregation function used to calculate all non-null) instances within a specified column. A more inclusive form called COUNT(*) can be used to count all the rows in a table, including null values.

CASE WHEN

In SQL, the CASE WHEN statement is a conditional expression that allows you to perform logical evaluations based on specified conditions and return relevant values. This is useful in conducting conditional transformations or aggregations of data.

COUNT (CASE WHEN...) is one such “conditional aggregation” we can perform with those simpler functions. Let’s explore how we are able to use it in SQL queries.

image

Explaining COUNT CASE WHEN

COUNT (CASE WHEN...) is known as a conditional count. This allows us to selectively count elements based on their values, unrestricted by their null values. By using the CASE WHEN statement within the COUNT function, we can define specific conditions and adjust the count only when those conditions are met. This criteria allows us to narrowly target the data in our analysis.

Creating Queries With SQL COUNT CASE WHEN

To understand the usage of SQL’s COUNT (CASE WHEN...) in practical scenarios, let’s take a look at the statement in action. Consider the following code snippet:

SELECT
COUNT (
CASE WHEN order_type = 'purchase' then 1
ELSE 0
END
) FROM ORDERS;

In this query, we utilize the COUNT (CASE WHEN...) statement to count the number of purchases within the “orders” table. From how we coded the query, we only count a row if it has the order_type of ‘purchase’. This can be expressed by tagging the result as 1. On the other hand, if it is not of type purchase, we can tag it as 0.

Let’s Explore The Following Cases

Let’s consider the following queries. Which of the following is logically equivalent to the query above? (Focus specifically on: COUNT(CASE WHEN order_type = 'purchase' then 1 END) ).

  • A: COUNT(CASE WHEN order_type = 'purchase' then 0 END)
  • B: COUNT(CASE WHEN order_type = 'purchase' then 2 END)
  • C: COUNT(CASE WHEN order_type = 'purchase' then NULL END)
  • D: COUNT(CASE WHEN order_type = 'purchase' then 1 ELSE 0 END)

In SQL, the COUNT() function only considers non-null values. The case statement inside the COUNT() function generates a value for each row in the dataset. The COUNT() function then counts the number of rows that do not have a NULL value generated by the case statement.

Given this understanding, the logically equivalent options to COUNT(CASE WHEN order_type = 'purchase' then 1 END) are:

  • A: COUNT(CASE WHEN order_type = 'purchase' then 0 END)
  • B: COUNT(CASE WHEN order_type = 'purchase' then 2 END)

These queries are equivalent because they also produce non-null values for each row where order_type = 'purchase', and hence will be counted by the COUNT() function.

The options that are not logically equivalent are:

  • C: COUNT(CASE WHEN order_type = 'purchase' then NULL END)

    This query is not equivalent because it produces NULL when order_type = 'purchase', and hence it will not be counted by the COUNT() function.

  • D: COUNT(CASE WHEN order_type = 'purchase' then 1 ELSE 0 END)

    This query is not equivalent because it produces a non-null value for every row in the dataset, not just those where order_type = 'purchase', and hence all rows will be counted by the COUNT() function, changing the result from the original query.

image

Real-World Examples Of COUNT CASE WHEN

Real-world examples are invaluable in helping us understand and apply concepts in practical ways. You may take different approaches with the COUNT (CASE WHEN...) when designing for unique business solutions and business logic, so familiarizing yourself now with patterns of use can make your work go by much quicker in the future.

Real-World Example: Counting Sheep

In your role on the logistics team at Happy Farms Inc., you are responsible for ensuring smooth operations and meeting customer demands. One of our esteemed buyers has expressed interest in purchasing a significant quantity of sheep products. You must ensure that we have sufficient stock to fulfill their order.

Let’s explore different ways to reach the solution:

Simple Classification-Based Counting

One of the most basic and commonly used problem scenarios when using COUNT (CASE WHEN...) is to count the number of instances of a classification.

To determine the availability of sheep products, you can swiftly retrieve the information from our comprehensive database through a simple query:

SELECT 
  COUNT(CASE WHEN product = 'sheep' THEN 1) 
FROM 
  products;

Anti-Thesis: The WHERE Argument

The use of COUNT (CASE WHEN...) as a part of our query logic might not be needed to perform such a simple task however. It might instead be more efficient, and easier to code, to use the WHERE clause instead. Take a look at the following query:

SELECT 
  COUNT(*) AS sheep_product_stock 
FROM 
  products 
WHERE 
  product = 'sheep';

As things become more complex, there are instances where the use of COUNT (CASE WHEN...) is not only beneficial but also necessary. Let’s see an iterated version of our sheep storage question for these use cases.

Complex Classification-Based Counting

When you need to count based on a single classification, the WHERE approach can be effective. However, real-world business challenges often necessitate more intricate queries, where counting based on multiple classifications becomes necessary. In such cases, relying solely on the WHERE clause can become cumbersome and sub-optimal. Instead, using COUNT (CASE WHEN...) can provide the more efficient solution.

Real-World Example: A Bigger Buyer

Happy Farms has received a new request. One of our esteemed buyers wishes to purchase various products, not just sheep. It is crucial to verify our stock availability to fulfill their order promptly. We will use COUNT (CASE WHEN...) to find the status of all categories of livestock.

SELECT 
  COUNT(CASE WHEN product = 'sheep' AND status = 'pending' THEN 1 END) 
		AS sheep_pending,
  COUNT(CASE WHEN product = 'sheep' AND status = 'shipped' THEN 1 END) 
		AS sheep_shipped,
  COUNT(CASE WHEN product = 'sheep' AND status = 'cancelled' THEN 1 END) 
		AS sheep_cancelled,
  COUNT(CASE WHEN product = 'cow' AND status = 'pending' THEN 1 END) 
		AS cow_pending,
  COUNT(CASE WHEN product = 'cow' AND status = 'shipped' THEN 1 END) 
		AS cow_shipped,
  COUNT(CASE WHEN product = 'cow' AND status = 'cancelled' THEN 1 END) 
		AS cow_cancelled,
  COUNT(CASE WHEN product = 'goat' AND status = 'pending' THEN 1 END) 
		AS goat_pending,
  COUNT(CASE WHEN product = 'goat' AND status = 'shipped' THEN 1 END) 
		AS goat_shipped,
  COUNT(CASE WHEN product = 'goat' AND status = 'cancelled' THEN 1 END) 
		AS goat_cancelled
FROM orders;

This query looks ugly, and there are certainly ways to make it more efficient by utilizing the GROUP BY clause. Let’s take a look at that now.

Group By With COUNT CASE WHEN

By applying the GROUP BY clause, we can streamline the query and enhance its performance. The grouping operation allows us to aggregate the counts based on common attributes, resulting in a cleaner and more concise query structure.

Let’s take a look at an improved version of the query using the GROUP BY clause:

SELECT 
  product,
  COUNT(CASE WHEN status = 'pending' THEN 1 END) AS count_pending,
  COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS count_shipped,
  COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS count_cancelled
FROM orders
GROUP BY product;

That’s a significant improvement! Because COUNT is an aggregate function we can use GROUP BY to generate more detailed and informative result sets.

The GROUP BY clause allows us to organize the data based on specific attributes, such as ‘product’ in this case, creating distinct groups within the dataset. Then, with COUNT, we can perform calculations within each group, such as counting the occurrences of different status categories like ‘pending,’ ‘shipped,’ and ‘cancelled.’

This combination of functions enables us to obtain a more granular view of our data, revealing the distribution and quantities of orders across different product categories.

image

Advanced COUNT CASE WHEN Techniques (With Interview Questions)

The COUNT(CASE WHEN...) statement may appear simple at first glance, but it offers immense power when used strategically. By employing advanced techniques, you can harness its full potential, especially when tackling interview questions. In this section, we will delve into effective strategies for utilizing COUNT(CASE WHEN...) efficiently and learn how to provide optimal answers to related queries.

NULL Counting: Ticket Agent Analysis

Consider a different approach to counting values using the CASE WHEN statement. Instead of counting non-null values, we can count instances of NULL. This technique can be useful in certain scenarios. Let’s explore it through the following interview question:

Question

The support team at your company is trying to analyze ticketing data to improve response times. They have noticed that a significant number of tickets have no assigned agent, making it difficult to track the progress of these tickets.

Write a SQL query to count the total number of tickets, and the number of tickets that are either assigned or unassigned to agents.

Note: Tickets that are not assigned to any agent have NULL in their agent_id field.

Example:

Input:

tickets table

tickets table

Column Type
id INTEGER
issue VARCHAR
agent_id INTEGER
created_at TIMESTAMP
updated_at TIMESTAMP

Output:

Column Type
total_tickets INTEGER
tickets_with_agent INTEGER
tickets_without_agent INTEGER

Solution

SELECT 
  COUNT(*) AS total_tickets,
  COUNT(agent_id) AS tickets_with_agent,
  COUNT(CASE WHEN agent_id IS NULL 1) AS tickets_without_agent
FROM 
  tickets;

Answering this question is relatively easy, and there are several ways to approach it. Nonetheless, one of the simplest techniques to tackle this question is by utilizing the COUNT(CASE WHEN...) method. By employing count CASE WHEN, we can reverse the usual behavior of the count function and instead count the number of NULL values using the conditional CASE WHEN agent_id IS NULL 1.

Nested COUNT CASE WHEN: Player Analysis

In some scenarios, you may need to nest CASE WHEN statements within COUNT to perform more complex data analysis. This technique is particularly useful when you need to count based on multiple conditions.

Let’s explore this through an example:

Question

You are a data analyst at a videogame production company and you’ve been asked to analyze the player’s behaviors. Specifically, they want to know the count of players who have played a number of games that is less than 10 but more than 5, as well as the count of players who have played 10 or more games.

The players table is as follows:

Column Type
id INTEGER
name VARCHAR
games_played INTEGER
registered_at TIMESTAMP

Write a SQL query to obtain the needed information.

Output:

Solution


SELECT
  COUNT(CASE WHEN games_played > 5 AND games_played < 10 THEN 1 END) AS players_more_than_5_to_10_games,
  COUNT(CASE WHEN games_played >= 10 THEN 1 END) AS players_10_plus_games
FROM
  players;

In this query, we use nested CASE WHEN conditions inside the COUNT function. When the conditions are met, it returns 1 in the new column, and the COUNT function then reveals how many players there are total in that category.

image

Test Yourself

Having learned the basics of COUNT (CASE WHEN...) it’s now time to test yourself with an interview question. The following question has many approaches, but try to solve it with COUNT (CASE WHEN...)

Employee Salaries

Given an employees and departments table, select the top 3 departments with at least 10 employees and rank them according to the percentage of their employees making over $100,000 in salary.

Example:

Input:

employees table

Columns Type
id INTEGER
first_name VARCHAR
last_name VARCHAR
salary INTEGER
department_id INTEGER

departments table

Columns Type
id INTEGER
name VARCHAR

Output:

Column Type
percentage_over_100k FLOAT
department_name VARCHAR
number_of_employees INTEGER

image