GROUP BY and HAVING
In the beginner level course, we saw how aggregate functions can be used to summarize metrics from a table. However, their real utility comes when we combine them with the
GROUP BY clause.
Let’s say we have a
sales table with the following information:
If we wanted to know the total amount sold by each employee with the tools we’ve learned so far, we would need a separate query for each of them.
In this case, this would mean that we would find the total amount sold by each employee separately. For example, to find the amount sold by the first employee, we could use the following query:
SELECT employeeID, SUM(amount) AS total_sold FROM employees WHERE employeeID = 1
WHERE clause filters the sales done by the first employee:
And then, the aggregation function finds the total amount sold for this smaller table. The output of the query would be:
If the table is too large, this approach is inconvenient, as it would need to find the total amount sold for each employee separately.
SQL allows us to solve this problem with a single query, using the
GROUP BY clause. The
GROUP BY clause lets us:
- Divide our table into smaller tables according to a single column.
- Perform an aggregate function over each of them.
- Integrate the results into a single table again.
In this case, the query we would use is:
SELECT employeeID, SUM(amount) AS total_sol FROM employees GROUP BY employeeID
Our output would be:
The logic of it works in the following way:
GROUP BYclause divides our original table into four smaller tables according to their
- Then, we select
SUM(amount) AS total_amountfrom each of the tables:
- Finally, the results are integrated into a single table:
In reality, the
GROUP BY clause does all those steps at once, and is more efficient than carrying them all out separately.
We can use the
GROUP BY clause with any aggregate function we need. We can specify more than one argument in the
GROUP BY clause by using a separating comma. The original tables will be divided into smaller groups for which all the columns in the
GROUP BY clause have the same values.
GROUP BY location_id, employee_id would divide a table into smaller groups that share the same
HAVING clause is a filter for the groups made by the
GROUP BY. It works similarly to the
WHERE clause, but it’s applied to groups instead of single rows.
For example, if we wanted to find the total amount sold for employees who performed more than one sale, we could do:
SELECT employeeID, SUM(amount) AS total_sold FROM employees GROUP BY employeeID HAVING COUNT(*) > 1
GROUP BY clause would divide our table into groups according to their
employeeID and then filter each of the groups according to our condition. In this case, it would discard the smaller tables 3 and 4, as they only have one row.
After filtering the groups, it continues just as the
GROUP BY clause does. It performs the aggregate functions in the
SELECT on each of the groups and integrates the results into a single table.
SELECT first_name, SUM(salary) FROM employees JOIN sales ON employee.id = sales.employeeID GROUP BY employee.id HAVING COUNT(*) > 1 WHERE salary > 7500 AND last_name != "Williams"
FROM-> Gets input tables
JOIN … ON-> Merges input tables
WHERE-> Filters rows according to condition
GROUP BY-> Separates into smaller tables
HAVING-> Filters smaller tables according to condition
SELECT-> Selects the columns we need (and integrates smaller tables)
SUM()and aggregate functions or numerical operations performed on output.
You have 32 sections remaining on this learning path.
Advance your learning journey! Go Premium and unlock 40+ hours of specialized content.